Skip to content

LoRA for experts layers in MoE #2527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ebsmothers opened this issue May 1, 2025 · 1 comment
Open

LoRA for experts layers in MoE #2527

ebsmothers opened this issue May 1, 2025 · 1 comment

Comments

@ebsmothers
Copy link

ebsmothers commented May 1, 2025

Feature request

The ability to apply LoRA (or other adapters) to experts in MoE models.

Motivation

Mixture-of-experts models with token choice routing contain FFNs within each expert, which are often implemented using a batched matmul over all experts (ref from Llama4). This is a bit different than vanilla FFNs as the parameters are represented as 3D nn.Parameters as opposed to nn.Linears. However, given that it's fairly common to apply LoRA to vanilla FFNs, it would also be useful to tune the experts in an MoE model with PEFT. (There are some challenges here, e.g. the usage of nn.Parameters probably precludes the possibility of doing this via module swap directly on nn.Linears.)

Your contribution

Happy to help with thoughts on the design. We have a version of this in torchtune (ref) and would love to interoperate with PEFT if this is something you're interested in supporting!

@githubnemo
Copy link
Collaborator

Hey @ebsmothers,

thanks for raising attention for LoRA MoE adapters :)

If I understand correctly, we would need a specific layer adapter for Llama4TextExperts as there currently is no established interface for how grouped experts are to be implemented (in contrast to, say, multi-head attention). This would also alleviate the need for dealing with nn.Parameters since we target the whole module (although there would be a way to deal with nn.Parameters as done for nn.MultiHeadAttention).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants