[NVIDIA] Add Cutlass MLA backend #1031

kaixih · 2025-04-22T20:55:40Z

This PR add a cutlass backend to the flashinfer BatchMLAPagedAttentionWrapper.

yzh119

LGTM, let's unify the interface later.

kaixih and others added 4 commits April 22, 2025 20:52

Add cutlass mla

7d1489d

skip tests for older gpus

c481f2e

formatting

7f455e0

format

f9f9e43

yzh119 approved these changes Apr 23, 2025

View reviewed changes

yzh119 merged commit 26ebac7 into flashinfer-ai:main Apr 23, 2025
2 checks passed

Provide feedback