[Kernel] Add fp8_w8a8 fused MoE kernel tuning configs for DeepSeek V3/R1 on NVIDIA H20 #16753

Ximingwang-09 · 2025-04-17T03:29:02Z

Refer to sgl-project/sglang#5196, add fp8_w8a8 fused MoE kernel tuning configs for DeepSeek V3/R1 on H20. There is a significant throughput improvements compared to default configurations.

github-actions · 2025-04-17T03:29:10Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: ximing.wxm <[email protected]>

mgoin

Thanks!

…/R1 on NVIDIA H20 (vllm-project#16753) Signed-off-by: ximing.wxm <[email protected]> Co-authored-by: ximing.wxm <[email protected]> Signed-off-by: Yang Wang <[email protected]>

ayrnb · 2025-04-27T08:34:10Z

H20 96G or 141G？

Ximingwang-09 · 2025-04-27T10:19:32Z

H20 96G or 141G？

96G

…/R1 on NVIDIA H20 (vllm-project#16753) Signed-off-by: ximing.wxm <[email protected]> Co-authored-by: ximing.wxm <[email protected]>

handsome-chips · 2025-04-29T02:50:58Z

Is there a configuration for the H800?

…/R1 on NVIDIA H20 (vllm-project#16753) Signed-off-by: ximing.wxm <[email protected]> Co-authored-by: ximing.wxm <[email protected]>

github-project-automation bot added this to DeepSeek V3/R1 Apr 17, 2025

github-project-automation bot moved this to Backlog in DeepSeek V3/R1 Apr 17, 2025

Ximingwang-09 mentioned this pull request Apr 17, 2025

[Kernel] Add more tuned configs #14877

Merged

ximing.wxm added 2 commits April 17, 2025 12:55

H20 dtype fp8_w8a8 fused MoE kernel tuning configs

59f2b45

Signed-off-by: ximing.wxm <[email protected]>

pre-commit

e25c495

Signed-off-by: ximing.wxm <[email protected]>

Ximingwang-09 force-pushed the tuned_config branch from 030ebdb to e25c495 Compare April 17, 2025 04:55

jeejeelee requested a review from simon-mo April 17, 2025 06:52

mgoin approved these changes Apr 17, 2025

View reviewed changes

github-project-automation bot moved this from Backlog to In progress in DeepSeek V3/R1 Apr 17, 2025

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 17, 2025

DarkLight1337 merged commit a018e55 into vllm-project:main Apr 17, 2025
60 checks passed

github-project-automation bot moved this from In progress to Done in DeepSeek V3/R1 Apr 17, 2025

Ximingwang-09 deleted the tuned_config branch April 18, 2025 02:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kernel] Add fp8_w8a8 fused MoE kernel tuning configs for DeepSeek V3/R1 on NVIDIA H20 #16753

[Kernel] Add fp8_w8a8 fused MoE kernel tuning configs for DeepSeek V3/R1 on NVIDIA H20 #16753

Ximingwang-09 commented Apr 17, 2025 •

edited

Loading

github-actions bot commented Apr 17, 2025

mgoin left a comment

ayrnb commented Apr 27, 2025

Ximingwang-09 commented Apr 27, 2025

handsome-chips commented Apr 29, 2025

[Kernel] Add fp8_w8a8 fused MoE kernel tuning configs for DeepSeek V3/R1 on NVIDIA H20 #16753

[Kernel] Add fp8_w8a8 fused MoE kernel tuning configs for DeepSeek V3/R1 on NVIDIA H20 #16753

Conversation

Ximingwang-09 commented Apr 17, 2025 • edited Loading

github-actions bot commented Apr 17, 2025

mgoin left a comment

Choose a reason for hiding this comment

ayrnb commented Apr 27, 2025

Ximingwang-09 commented Apr 27, 2025

handsome-chips commented Apr 29, 2025

Ximingwang-09 commented Apr 17, 2025 •

edited

Loading