musa: enable fp16 mma (all) and cublas on qy2 #13842

yeahdongcn · 2025-05-28T06:40:57Z

Make sure to read the contributing guidelines before submitting a PR

This PR is a rework of #13149.
Regarding the cublasGemmBatchedEx issue on the MTT S4000, I will continue investigating it with our MUBLAS team to identify a proper solution.

Testing Done

All tests below were performed on the MTT S80.

Build completed successfully

./build/bin/test-backend-ops passed

root@eab90abe42cc:/ws# ./build/bin/test-backend-ops 
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 MUSA devices:
  Device 0: MTT S80, compute capability 2.1, VMM: yes
Testing 2 devices

Backend 1/2: MUSA0
  Device description: MTT S80
  Device memory: 16297 MB (15723 MB free)

  ABS(type=f16,ne_a=[128,2,2,2],v=0): OK
  ABS(type=f16,ne_a=[5,7,11,13],v=0): OK
  ...
  CROSS_ENTROPY_LOSS(type=f32,ne=[10,5,4,3]): OK
  CROSS_ENTROPY_LOSS(type=f32,ne=[30000,1,1,1]): OK
  CROSS_ENTROPY_LOSS_BACK(type=f32,ne=[10,5,4,3]): OK
  CROSS_ENTROPY_LOSS_BACK(type=f32,ne=[30000,1,1,1]): OK
  OPT_STEP_ADAMW(type=f32,ne=[10,5,4,3]): OK
  5527/5527 tests passed
  Backend MUSA0: OK

Backend 2/2: CPU
  Skipping CPU backend
2/2 backends passed
OK

Tested DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf, qwen3_8b_q4_k_m.gguf, nvidia-llama-3_1-nemotron-nano-8b-v1-q4_k_m.gguf with or without the -fa flag

ggml/src/ggml-cuda/common.cuh

ggml/src/ggml-cuda/ggml-cuda.cu

yeahdongcn · 2025-06-04T10:22:05Z

Let's hold off on merging this until I upgrade to the upcoming MUSA SDK and verify that everything works as expected.

Signed-off-by: Xiaodong Ye <[email protected]>

Co-authored-by: Johannes Gäßler <[email protected]>

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn · 2025-06-25T06:14:59Z

Rebased onto upstream/master.

* mamba2-sync: (22 commits) recurrent : call balloc split_reset() in init_batch() (ggml-org#14414) ggml : add ggml_set_rows (ggml-org#14274) convert : fix broken sentencepiece vocab (ggml-org#14416) mamba : fix mismatched new and delete size for llm_build_mamba model : gemma3n text-only (ggml-org#14400) cmake: regen vulkan shaders when shaders-gen sources change (ggml-org#14398) llama : return mistral-v7-tekken as default template only (ggml-org#14390) metal : add special-case mat-vec mul for ne00 == 4 (ggml-org#14385) metal : batch rows copy in a single threadgroup (ggml-org#14384) docs: update s390x documentation + add faq (ggml-org#14389) musa: enable fp16 mma (all) and cublas on qy2 (ggml-org#13842) ggml-cpu: enable IBM NNPA Vector Intrinsics (ggml-org#14317) ggml : do not output unprintable characters on GGUF load failure (ggml-org#14381) sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices (ggml-org#13973) opencl: ref count `ggml_backend_opencl_context` and refactor profiling (ggml-org#14254) batch : fix check for empty sequences in memory (ggml-org#14364) cmake : use LLAMA_BUILD_NUMBER when defining LLAMA_INSTALL_VERSION (ggml-org#14362) server : move no API key doc to /health (ggml-org#14352) main : honor --verbose-prompt on interactive prompts (ggml-org#14350) jinja : Add Mistral-Small-3.2-24B-Instruct-2506.jinja (ggml-org#14349) ...

This reverts commit 716301d.

yeahdongcn requested a review from JohannesGaessler as a code owner May 28, 2025 06:40

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 28, 2025

JohannesGaessler reviewed May 28, 2025

View reviewed changes

ggml/src/ggml-cuda/common.cuh Outdated Show resolved Hide resolved

ggml/src/ggml-cuda/ggml-cuda.cu Outdated Show resolved Hide resolved

ggml/src/ggml-cuda/ggml-cuda.cu Outdated Show resolved Hide resolved

JohannesGaessler reviewed May 28, 2025

View reviewed changes

ggml/src/ggml-cuda/ggml-cuda.cu Outdated Show resolved Hide resolved

yeahdongcn requested a review from JohannesGaessler May 29, 2025 10:45

yeahdongcn mentioned this pull request May 29, 2025

musa: extract ggml_cuda_mul_mat_batched_cublas_gemm_batched_ex #13887

Closed

3 tasks

JohannesGaessler approved these changes May 29, 2025

View reviewed changes

yeahdongcn force-pushed the xd/mma_blas branch from 8a5b53c to 2923d31 Compare June 4, 2025 10:44

yeahdongcn mentioned this pull request Jun 19, 2025

CUDA: mul_mat_v support for batch sizes > 1 #14262

Merged

yeahdongcn and others added 5 commits June 25, 2025 13:56

musa: enable fp16 mma (all) and cublas on qy2

78a76c7

Signed-off-by: Xiaodong Ye <[email protected]>

Update ggml/src/ggml-cuda/ggml-cuda.cu

876ee4c

Co-authored-by: Johannes Gäßler <[email protected]>

Address review comments

09a875e

Signed-off-by: Xiaodong Ye <[email protected]>

Address review comments

4cdb7fe

Signed-off-by: Xiaodong Ye <[email protected]>

musa: disable MUL_MAT_ID (q2_k × f32) due to precision issues

f083484

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn force-pushed the xd/mma_blas branch from 2923d31 to f083484 Compare June 25, 2025 06:13

yeahdongcn merged commit 716301d into ggml-org:master Jun 26, 2025
48 checks passed

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Jun 28, 2025

Revert "musa: enable fp16 mma (all) and cublas on qy2 (ggml-org#13842)"

f7a2958

This reverts commit 716301d.

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Jun 28, 2025

Revert "musa: enable fp16 mma (all) and cublas on qy2 (ggml-org#13842)"

1e65120

This reverts commit 716301d.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

musa: enable fp16 mma (all) and cublas on qy2 #13842

musa: enable fp16 mma (all) and cublas on qy2 #13842

Uh oh!

yeahdongcn commented May 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yeahdongcn commented Jun 4, 2025

Uh oh!

yeahdongcn commented Jun 25, 2025

Uh oh!

Uh oh!

Uh oh!

musa: enable fp16 mma (all) and cublas on qy2 #13842

musa: enable fp16 mma (all) and cublas on qy2 #13842

Uh oh!

Conversation

yeahdongcn commented May 28, 2025

Testing Done

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yeahdongcn commented Jun 4, 2025

Uh oh!

yeahdongcn commented Jun 25, 2025

Uh oh!

Uh oh!

Uh oh!