Skip to content

musa: enable fp16 mma (all) and cublas on qy2 #13842

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 26, 2025

Conversation

yeahdongcn
Copy link
Collaborator

Make sure to read the contributing guidelines before submitting a PR

This PR is a rework of #13149.
Regarding the cublasGemmBatchedEx issue on the MTT S4000, I will continue investigating it with our MUBLAS team to identify a proper solution.

Testing Done

All tests below were performed on the MTT S80.

  • Build completed successfully
  • ./build/bin/test-backend-ops passed
    root@eab90abe42cc:/ws# ./build/bin/test-backend-ops 
    ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
    ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
    ggml_cuda_init: found 1 MUSA devices:
      Device 0: MTT S80, compute capability 2.1, VMM: yes
    Testing 2 devices
    
    Backend 1/2: MUSA0
      Device description: MTT S80
      Device memory: 16297 MB (15723 MB free)
    
      ABS(type=f16,ne_a=[128,2,2,2],v=0): OK
      ABS(type=f16,ne_a=[5,7,11,13],v=0): OK
      ...
      CROSS_ENTROPY_LOSS(type=f32,ne=[10,5,4,3]): OK
      CROSS_ENTROPY_LOSS(type=f32,ne=[30000,1,1,1]): OK
      CROSS_ENTROPY_LOSS_BACK(type=f32,ne=[10,5,4,3]): OK
      CROSS_ENTROPY_LOSS_BACK(type=f32,ne=[30000,1,1,1]): OK
      OPT_STEP_ADAMW(type=f32,ne=[10,5,4,3]): OK
      5527/5527 tests passed
      Backend MUSA0: OK
    
    Backend 2/2: CPU
      Skipping CPU backend
    2/2 backends passed
    OK
  • Tested DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf, qwen3_8b_q4_k_m.gguf, nvidia-llama-3_1-nemotron-nano-8b-v1-q4_k_m.gguf with or without the -fa flag

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 28, 2025
@yeahdongcn
Copy link
Collaborator Author

Let's hold off on merging this until I upgrade to the upcoming MUSA SDK and verify that everything works as expected.

@yeahdongcn
Copy link
Collaborator Author

Rebased onto upstream/master.

@yeahdongcn yeahdongcn merged commit 716301d into ggml-org:master Jun 26, 2025
48 checks passed
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Jun 27, 2025
* mamba2-sync: (22 commits)
recurrent : call balloc split_reset() in init_batch() (ggml-org#14414)
ggml : add ggml_set_rows (ggml-org#14274)
convert : fix broken sentencepiece vocab (ggml-org#14416)
mamba : fix mismatched new and delete size for llm_build_mamba
model : gemma3n text-only (ggml-org#14400)
cmake: regen vulkan shaders when shaders-gen sources change (ggml-org#14398)
llama : return mistral-v7-tekken as default template only (ggml-org#14390)
metal : add special-case mat-vec mul for ne00 == 4 (ggml-org#14385)
metal : batch rows copy in a single threadgroup (ggml-org#14384)
docs: update s390x documentation + add faq (ggml-org#14389)
musa: enable fp16 mma (all) and cublas on qy2 (ggml-org#13842)
ggml-cpu: enable IBM NNPA Vector Intrinsics (ggml-org#14317)
ggml : do not output unprintable characters on GGUF load failure (ggml-org#14381)
sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices (ggml-org#13973)
opencl: ref count `ggml_backend_opencl_context` and refactor profiling (ggml-org#14254)
batch : fix check for empty sequences in memory (ggml-org#14364)
cmake : use LLAMA_BUILD_NUMBER when defining LLAMA_INSTALL_VERSION (ggml-org#14362)
server : move no API key doc to /health (ggml-org#14352)
main : honor --verbose-prompt on interactive prompts (ggml-org#14350)
jinja : Add Mistral-Small-3.2-24B-Instruct-2506.jinja (ggml-org#14349)
...
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Jun 28, 2025
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Jun 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants