Skip to content

Yuan-ManX/FlashMLA-PyTorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FlashMLA PyTorch

PyTorch implementation of FlashMLA.

FlashMLA is an efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences serving. Currently released: BF16; Paged kvcache with block size of 64.

About

PyTorch implementation of FlashMLA.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published