-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Updating Add qwen3
(PR 2903) to use HF weights
#2930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@greenrazer looks like the MoE model has the same architecture as the dense one and the MoE block from the V2! |
I couldn't test the MoE model completely because it couldn't run on my computer. So if anyone has a beefier rig that'd be great :). Also the Model never terminates. I tried adding an EOS token to |
Will test! |
@greenrazer seems that it is indeed broken for the MoE, while the dense model roughly works. Let me know if there is any information you need. |
…ence of weights for lm head in CasualLM
Thank you @EricLBuehler! Since the MoE model isn’t currently working and I don’t have the resources to debug it efficiently, I’m planning to remove it from this PR and move it to a draft instead. That way, someone else can build on it if they’d like, or I can revisit it later, especially if smaller MoE variants are released. Edit: PR here #2934 |
Co-authored-by: Laurent Mazare <[email protected]>
…d of the opposite
Builds off of @maximizemaxwell 's PR, who wrote it before the weights were released.
This PR:
Issues 2025/4/28Qwen3 doesn't work with qwen example code by defaultQwen3 modeling code needs slight rework