-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Support new arch of GLM4 models #2991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I think the new example you created should be combined with the existing glm4 example using some switching logic similar to the gemma example here. Otherwise, it looks good! |
Thanks for the feedback, I will revise this. |
…te bugs for old GLM4
As suggested, I’ve integrated both the old and new GLM4 into a single example, using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functionality looks good overall.
I noticed this introduces a dependency on the either crate. To keep our dependency tree minimal, please avoid adding new dependencies if possible. A simple custom enum could replace the Either usage here.
Thanks for working on this PR! :)
Thanks for the comments. I have removed either crate by using a custom EosTokenId struct and deserialization pattern. |
The latest GLM-4 (0414 version) uses a different architecture. The existing GLM-4 implementation is not compatible with the GLM-4-0414 series. This PR adds support for the new architecture.
Tested case
cargo run --example glm4_new --release --features cuda -- --weight-path /home/data/GLM-4-9B-0414 --prompt "How are you today?"