You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Updating Add qwen3 (PR 2903) to use HF weights (#2930)
* add Qwen3.rs
* fixed compile error
* attempting to gett pr 2903 working with qwen weights
* different qwen variants working
* added moe model
* clippy
* added additional eos token
* translated Korean comments to English as well as I can
* removed specialized Qwen3RmsNorm and replaced with generic Candle RmsNorm
* replaced custom repeat_kv implementation with candle's repeat_kv implementation
* replace linear with linear_b in attention initalization
* replaced custom custom kv_cache implementation with candle kv_cache
* style
* replaced explicit broadcast add with normal add in decoder layer
* removed keeping the Rotary embedding layer in the model struct
* used tie_word_embeddings bool from config instead of relying on existence of weights for lm head in CasualLM
* removed duplicate code from qwen3_moe
* removed sliding window from qwen3 attention
* removed MoE code
* removed unused option
* Fixed Typo
Co-authored-by: Laurent Mazare <[email protected]>
* fixed tie word embeddings to use the correct embedding weights instead of the opposite
---------
Co-authored-by: Max <[email protected]>
Co-authored-by: Laurent Mazare <[email protected]>
0 commit comments