Updating `Add qwen3` (PR 2903) to use HF weights #2930

greenrazer · 2025-04-29T00:04:01Z

Builds off of @maximizemaxwell 's PR, who wrote it before the weights were released.

This PR:

Loads the weights
Fixes modeling code
Adds Multiple Qwen3 models to the qwen example.

Issues 2025/4/28

~~Qwen3 doesn't work with qwen example code by default~~
~~Qwen3 modeling code needs slight rework~~

EricLBuehler · 2025-04-29T19:05:34Z

@greenrazer looks like the MoE model has the same architecture as the dense one and the MoE block from the V2!

candle-transformers/src/models/qwen3.rs

greenrazer · 2025-04-30T00:17:45Z

I couldn't test the MoE model completely because it couldn't run on my computer. So if anyone has a beefier rig that'd be great :).

Also the Model never terminates. I tried adding an EOS token to <im_end>. It's probably an issue with rotary embeddings like last time. I'll have to take a look later.

EricLBuehler · 2025-04-30T00:18:38Z

I couldn't test the MoE model completely because it couldn't run on my computer. So if anyone has a beefier rig that'd be great :).

Will test!

candle-transformers/src/models/qwen3.rs

candle-transformers/src/models/qwen3_moe.rs

EricLBuehler · 2025-04-30T03:39:30Z

I couldn't test the MoE model completely because it couldn't run on my computer. So if anyone has a beefier rig that'd be great :).

Will test!

@greenrazer seems that it is indeed broken for the MoE, while the dense model roughly works. Let me know if there is any information you need.

…Norm

…ementation

…ence of weights for lm head in CasualLM

greenrazer · 2025-04-30T20:59:44Z

I couldn't test the MoE model completely because it couldn't run on my computer. So if anyone has a beefier rig that'd be great :).

Will test!

@greenrazer seems that it is indeed broken for the MoE, while the dense model roughly works. Let me know if there is any information you need.

Thank you @EricLBuehler!

Since the MoE model isn’t currently working and I don’t have the resources to debug it efficiently, I’m planning to remove it from this PR and move it to a draft instead. That way, someone else can build on it if they’d like, or I can revisit it later, especially if smaller MoE variants are released.

Edit:

PR here #2934

candle-examples/examples/qwen/main.rs

candle-transformers/src/models/qwen3.rs

Co-authored-by: Laurent Mazare <[email protected]>

…d of the opposite

maximizemaxwell and others added 3 commits April 17, 2025 12:43

add Qwen3.rs

7a13e58

fixed compile error

e2da619

attempting to gett pr 2903 working with qwen weights

9c39581

LaurentMazare mentioned this pull request Apr 29, 2025

Is qwen3 being worked on for candle-transformers? #2931

Closed

asukaminato0721 reviewed Apr 29, 2025

View reviewed changes

candle-transformers/src/models/qwen3.rs Outdated Show resolved Hide resolved

greenrazer added 3 commits April 29, 2025 16:09

different qwen variants working

a014a67

added moe model

8727cdf

clippy

7b5aadd

greenrazer marked this pull request as ready for review April 30, 2025 00:09

added additional eos token

e71c478

LaurentMazare reviewed Apr 30, 2025

View reviewed changes

greenrazer added 11 commits April 30, 2025 12:20

translated Korean comments to English as well as I can

bbb490f

removed specialized Qwen3RmsNorm and replaced with generic Candle Rms…

eada460

…Norm

replaced custom repeat_kv implementation with candle's repeat_kv impl…

80170fd

…ementation

replace linear with linear_b in attention initalization

181f2ce

replaced custom custom kv_cache implementation with candle kv_cache

02f0247

style

f2962f7

replaced explicit broadcast add with normal add in decoder layer

410e11e

removed keeping the Rotary embedding layer in the model struct

d99d104

used tie_word_embeddings bool from config instead of relying on exist…

a57c5ab

…ence of weights for lm head in CasualLM

removed duplicate code from qwen3_moe

5950077

removed sliding window from qwen3 attention

c635621

greenrazer added 2 commits April 30, 2025 14:03

removed MoE code

86e69dd

removed unused option

d943cbe

greenrazer mentioned this pull request Apr 30, 2025

Add Qwen3 MoE #2934

Merged

LaurentMazare reviewed May 1, 2025

View reviewed changes

candle-examples/examples/qwen/main.rs Show resolved Hide resolved

candle-transformers/src/models/qwen3.rs Outdated Show resolved Hide resolved

candle-transformers/src/models/qwen3.rs Show resolved Hide resolved

greenrazer and others added 2 commits May 1, 2025 13:30

Fixed Typo

a3d7f6e

Co-authored-by: Laurent Mazare <[email protected]>

fixed tie word embeddings to use the correct embedding weights instea…

bdbefa5

…d of the opposite

LaurentMazare approved these changes May 2, 2025

View reviewed changes

LaurentMazare merged commit 1fdfb58 into huggingface:main May 2, 2025
9 checks passed

greenrazer deleted the add-qwen branch May 6, 2025 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Updating `Add qwen3` (PR 2903) to use HF weights #2930

Updating `Add qwen3` (PR 2903) to use HF weights #2930

Uh oh!

greenrazer commented Apr 29, 2025 •

edited

Loading

Uh oh!

EricLBuehler commented Apr 29, 2025

Uh oh!

Uh oh!

greenrazer commented Apr 30, 2025 •

edited

Loading

Uh oh!

EricLBuehler commented Apr 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EricLBuehler commented Apr 30, 2025 •

edited

Loading

Uh oh!

greenrazer commented Apr 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Updating Add qwen3 (PR 2903) to use HF weights #2930

Updating Add qwen3 (PR 2903) to use HF weights #2930

Uh oh!

Conversation

greenrazer commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issues 2025/4/28

Uh oh!

EricLBuehler commented Apr 29, 2025

Uh oh!

Uh oh!

greenrazer commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EricLBuehler commented Apr 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EricLBuehler commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greenrazer commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Updating `Add qwen3` (PR 2903) to use HF weights #2930

Updating `Add qwen3` (PR 2903) to use HF weights #2930

greenrazer commented Apr 29, 2025 •

edited

Loading

greenrazer commented Apr 30, 2025 •

edited

Loading

EricLBuehler commented Apr 30, 2025 •

edited

Loading

greenrazer commented Apr 30, 2025 •

edited

Loading