FlashMLA PyTorch

PyTorch implementation of FlashMLA.

FlashMLA is an efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences serving. Currently released: BF16; Paged kvcache with block size of 64.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
flash_mla.h		flash_mla.h
flash_mla_interface.py		flash_mla_interface.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlashMLA PyTorch

About

Releases

Packages

Languages

License

Yuan-ManX/FlashMLA-PyTorch

Folders and files

Latest commit

History

Repository files navigation

FlashMLA PyTorch

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages