Integrate MLX SDPA kernels with mask #2820

EricLBuehler · 2025-03-22T01:38:42Z

This PR integrates kernel developments from: ml-explore/mlx#1924.

Specifically, our candle_nn::ops::sdpa function now dispatches to optimized implementations for with and without prompts. There is also an option for causal masking, removing the necessity for mask materialization.

Overall, this means that we can fuse the attention operation on Metal for prompt and decode phases!

I will update this PR further with benchmarks, but it is tested and working in my fork through mistral.rs.

ivarflakstad

Great work 🎉

I only have 2 comments really.

One is that at this point it's really time to start precompiling kernels. Not for performance, just for the sake of project structure and maintainability.
I have that ready to go so I'll make a PR soon.

The second is that I wonder if it makes sense to move more of the logic in candle-nn into candle-metal-kernels.
The standard asserts etc is fine, but maybe figuring out the correct kernel to call is actually a concept that belongs inside candle-metal-kernels?
That's open for discussion obviously.

In any case this is ready to merge in my opinion - if all tests pass and it runs smoothly ☺️

candle-kernels/build.rs

candle-metal-kernels/src/lib.rs

candle-nn/src/ops.rs

candle-metal-kernels/src/scaled_dot_product_attention.metal

…_mask

EricLBuehler and others added 2 commits March 21, 2025 21:33

Metal sdpa with mask causal and mask (#78)

bf361f8

Check for mask shape

c2859e1

EricLBuehler marked this pull request as ready for review March 22, 2025 01:38

EricLBuehler and others added 5 commits March 21, 2025 21:47

Fix the example

b6b10c2

Fix build errors

0a29e1c

Fix 2pass sdpa kernel (#73)

05521d4

Undo change to settings.json

e4fdfc3

Format

3e0ed45

ivarflakstad approved these changes Apr 17, 2025

View reviewed changes

candle-kernels/build.rs Outdated Show resolved Hide resolved

candle-metal-kernels/src/lib.rs Show resolved Hide resolved

candle-nn/src/ops.rs Show resolved Hide resolved

candle-metal-kernels/src/scaled_dot_product_attention.metal Show resolved Hide resolved

EricLBuehler added 2 commits April 18, 2025 05:39

Merge remote-tracking branch 'upstream/main' into dev_metal_sdpa_with…

df7be27

…_mask

Remove nvcc prepend flags

ce18f6b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integrate MLX SDPA kernels with mask #2820

Integrate MLX SDPA kernels with mask #2820

EricLBuehler commented Mar 22, 2025

Uh oh!

ivarflakstad left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Integrate MLX SDPA kernels with mask #2820

Are you sure you want to change the base?

Integrate MLX SDPA kernels with mask #2820

Conversation

EricLBuehler commented Mar 22, 2025

Uh oh!

ivarflakstad left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!