Skip to content

llama : build windows releases with dl backends #13220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 4, 2025
Merged

Conversation

slaren
Copy link
Member

@slaren slaren commented Apr 30, 2025

Changes:

  • Uses GGML_BACKEND_DL and GGML_CPU_ALL_VARIANTS to build the windows releases to enable dynamic loading of backends
  • Uses the clang compiler instead of msvc when possible for better performance
  • Creates a single release for the windows x64 CPU build
  • Removes the Kompute build
  • Removes the outdated arm64-msvc build and cmake toolchain file
  • Changes sccache to ccache, since it actually works well enough on windows and supports evict-old-files
  • Disables build of test-quantize-stats.cpp with GGML_BACKEND_DL
  • Disables outdated AVX-512 test with emulator
  • Removes -march=native from llvm cmake toolchain file

Notes:

  • I was not able to build the Vulkan release with clang on windows. Fixing this may improve performance of the Vulkan release with partial offloading (cc: @0cc4m @jeffbolznv)

Test run: https://github.com/slaren/llama.cpp/actions/runs/14762791544/job/41447243958
Test release: https://github.com/slaren/llama.cpp/releases/tag/b5235

@github-actions github-actions bot added build Compilation issues testing Everything test related devops improvements to build systems and github actions labels Apr 30, 2025
@jeffbolznv
Copy link
Collaborator

Can you provide more details on the clang Vulkan issue and how to reproduce it (or maybe file an issue)? Did you end up just using msvc for Vulkan instead?

@slaren
Copy link
Member Author

slaren commented Apr 30, 2025

Yes, it is still building the Vulkan release with msvc, same as before, but (at least) it also has the multiple CPU variants which should give it better compatibility with different CPUs. When I tried to build Vulkan with clang it failed with this error:

 [84/271] Performing install step for 'vulkan-shaders-gen'
FAILED: ggml/src/ggml-vulkan/vulkan-shaders-gen-prefix/src/vulkan-shaders-gen-stamp/Release/vulkan-shaders-gen-install D:/a/llama.cpp/llama.cpp/build/ggml/src/ggml-vulkan/vulkan-shaders-gen-prefix/src/vulkan-shaders-gen-stamp/Release/vulkan-shaders-gen-install 
C:\Windows\system32\cmd.exe /C "cd /D D:\a\llama.cpp\llama.cpp\build\ggml\src\ggml-vulkan\vulkan-shaders-gen-prefix\src\vulkan-shaders-gen-build && "C:\Program Files\CMake\bin\cmake.exe" --install . && "C:\Program Files\CMake\bin\cmake.exe" -E touch D:/a/llama.cpp/llama.cpp/build/ggml/src/ggml-vulkan/vulkan-shaders-gen-prefix/src/vulkan-shaders-gen-stamp/Release/vulkan-shaders-gen-install"
-- Install configuration: "Release"
CMake Error at cmake_install.cmake:50 (file):
  file INSTALL cannot find
  "D:/a/llama.cpp/llama.cpp/build/bin/Release/vulkan-shaders-gen.exe": No
  error.

Here is the full log: https://github.com/slaren/llama.cpp/actions/runs/14762355462/job/41445824619

@jeffbolznv
Copy link
Collaborator

OK, I've heard about this before, I think clang puts the exe in a different folder, we probably need some small change to the cmake file. I'll try to reproduce this soon.

@slaren
Copy link
Member Author

slaren commented Apr 30, 2025

I couldn't reproduce it locally. I suspect that it has something to do with this message while configuring cmake:

 -- Host compiler: C:/mingw64/bin/gcc.exe C:/mingw64/bin/g++.exe

It seems that it thinks that it is cross-compiling and uses a different compiler to build the shader-gen? Not sure what's going on there.

@jeffbolznv
Copy link
Collaborator

I tried building locally and while it eventually failed on some curl issue, it did get past the vulkan-shaders-gen part of the build.

Looking at the log again, I noticed this mismatch of Debug vs Release:

[2/2] Linking CXX executable D:\a\llama.cpp\llama.cpp\build\bin\Debug\vulkan-shaders-gen.exe

[84/271] Performing install step for 'vulkan-shaders-gen'
FAILED: ggml/src/ggml-vulkan/vulkan-shaders-gen-prefix/src/vulkan-shaders-gen-stamp/Release/vulkan-shaders-gen-install D:/a/llama.cpp/llama.cpp/build/ggml/src/ggml-vulkan/vulkan-shaders-gen-prefix/src/vulkan-shaders-gen-stamp/Release/vulkan-shaders-gen-install 
C:\Windows\system32\cmd.exe /C "cd /D D:\a\llama.cpp\llama.cpp\build\ggml\src\ggml-vulkan\vulkan-shaders-gen-prefix\src\vulkan-shaders-gen-build && "C:\Program Files\CMake\bin\cmake.exe" --install . && "C:\Program Files\CMake\bin\cmake.exe" -E touch D:/a/llama.cpp/llama.cpp/build/ggml/src/ggml-vulkan/vulkan-shaders-gen-prefix/src/vulkan-shaders-gen-stamp/Release/vulkan-shaders-gen-install"
-- Install configuration: "Release"
CMake Error at cmake_install.cmake:50 (file):
  file INSTALL cannot find
  "D:/a/llama.cpp/llama.cpp/build/bin/Release/vulkan-shaders-gen.exe": No
  error.

Maybe this issue is specific to the ninja multi-config?

@slaren
Copy link
Member Author

slaren commented May 1, 2025

One possible issue with this change that I didn't realize at first is that the examples that are not compatible with GGML_BACKEND_DL will no longer be included in the binary distribution. This includes these examples:

  • convert-llama2c-to-ggml
  • cvector-generator
  • export-lora
  • llava
  • rpc

The most impactful of these are likely to be the llava and the rpc server. cc @ngxson @rgerganov

Fixing this wouldn't be complicated. Essentially:

  • Add a call to ggml_backend_load_all on startup to load the backends
  • Use the backend registry instead of the backend-specific functions

@rgerganov
Copy link
Collaborator

The most impactful of these are likely to be the llava and the rpc server. cc @ngxson @rgerganov

I am traveling and won't be able to address this in the next few days, sorry. You can exclude rpc-server as stop-gap solution

@slaren slaren merged commit 9f2da58 into master May 4, 2025
43 checks passed
@slaren slaren deleted the sl/backend-dl-releases branch May 4, 2025 12:20
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request May 6, 2025
* origin/master: (27 commits)
llama : fix build_ffn without gate (ggml-org#13336)
CUDA: fix bad asserts for partial offload (ggml-org#13337)
convert : qwen2/3moe : set yarn metadata if present (ggml-org#13331)
CUDA: fix --split-mode row for MMQ (ggml-org#13323)
gguf-py : avoid requiring pyside6 for other scripts (ggml-org#13036)
CUDA: fix logic for clearing padding with -ngl 0 (ggml-org#13320)
sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (ggml-org#13264)
server : Webui - change setText command from parent window to also send the message. (ggml-org#13309)
mtmd : rename llava directory to mtmd (ggml-org#13311)
clip : fix confused naming ffn_up and ffn_down (ggml-org#13290)
convert : bailingmoe : set yarn metadata if present (ggml-org#13312)
SYCL: Disable mul_mat kernels for noncontiguous tensor b (ggml-org#13308)
mtmd : add C public API (ggml-org#13184)
rpc : use backend registry, support dl backends (ggml-org#13304)
ggml : activate s390x simd for Q3_K (ggml-org#13301)
llava/mtmd : fixes to fully support dl backends (ggml-org#13303)
llama : build windows releases with dl backends (ggml-org#13220)
CUDA: fix race condition in MMQ stream-k fixup (ggml-org#13299)
CUDA: fix race condition in MMQ ids_dst (ggml-org#13294)
vulkan: Additional type support for unary, binary, and copy (ggml-org#13266)
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Compilation issues devops improvements to build systems and github actions testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants