LoRA for experts layers in MoE #2527

ebsmothers · 2025-05-01T16:53:22Z

Feature request

The ability to apply LoRA (or other adapters) to experts in MoE models.

Motivation

Mixture-of-experts models with token choice routing contain FFNs within each expert, which are often implemented using a batched matmul over all experts (ref from Llama4). This is a bit different than vanilla FFNs as the parameters are represented as 3D nn.Parameters as opposed to nn.Linears. However, given that it's fairly common to apply LoRA to vanilla FFNs, it would also be useful to tune the experts in an MoE model with PEFT. (There are some challenges here, e.g. the usage of nn.Parameters probably precludes the possibility of doing this via module swap directly on nn.Linears.)

Your contribution

Happy to help with thoughts on the design. We have a version of this in torchtune (ref) and would love to interoperate with PEFT if this is something you're interested in supporting!

githubnemo · 2025-05-02T12:07:09Z

Hey @ebsmothers,

thanks for raising attention for LoRA MoE adapters :)

If I understand correctly, we would need a specific layer adapter for Llama4TextExperts as there currently is no established interface for how grouped experts are to be implemented (in contrast to, say, multi-head attention). This would also alleviate the need for dealing with nn.Parameters since we target the whole module (although there would be a way to deal with nn.Parameters as done for nn.MultiHeadAttention).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LoRA for experts layers in MoE #2527

LoRA for experts layers in MoE #2527

ebsmothers commented May 1, 2025 •

edited

Loading

githubnemo commented May 2, 2025

LoRA for experts layers in MoE #2527

LoRA for experts layers in MoE #2527

Comments

ebsmothers commented May 1, 2025 • edited Loading

Feature request

Motivation

Your contribution

githubnemo commented May 2, 2025

ebsmothers commented May 1, 2025 •

edited

Loading