llama : build windows releases with dl backends #13220

slaren · 2025-04-30T20:06:55Z

Changes:

Uses GGML_BACKEND_DL and GGML_CPU_ALL_VARIANTS to build the windows releases to enable dynamic loading of backends
Uses the clang compiler instead of msvc when possible for better performance
Creates a single release for the windows x64 CPU build
Removes the Kompute build
Removes the outdated arm64-msvc build and cmake toolchain file
Changes sccache to ccache, since it actually works well enough on windows and supports evict-old-files
Disables build of test-quantize-stats.cpp with GGML_BACKEND_DL
Disables outdated AVX-512 test with emulator
Removes -march=native from llvm cmake toolchain file

Notes:

I was not able to build the Vulkan release with clang on windows. Fixing this may improve performance of the Vulkan release with partial offloading (cc: @0cc4m @jeffbolznv)

Test run: https://github.com/slaren/llama.cpp/actions/runs/14762791544/job/41447243958
Test release: https://github.com/slaren/llama.cpp/releases/tag/b5235

jeffbolznv · 2025-04-30T21:29:37Z

Can you provide more details on the clang Vulkan issue and how to reproduce it (or maybe file an issue)? Did you end up just using msvc for Vulkan instead?

slaren · 2025-04-30T21:34:03Z

Yes, it is still building the Vulkan release with msvc, same as before, but (at least) it also has the multiple CPU variants which should give it better compatibility with different CPUs. When I tried to build Vulkan with clang it failed with this error:

 [84/271] Performing install step for 'vulkan-shaders-gen'
FAILED: ggml/src/ggml-vulkan/vulkan-shaders-gen-prefix/src/vulkan-shaders-gen-stamp/Release/vulkan-shaders-gen-install D:/a/llama.cpp/llama.cpp/build/ggml/src/ggml-vulkan/vulkan-shaders-gen-prefix/src/vulkan-shaders-gen-stamp/Release/vulkan-shaders-gen-install 
C:\Windows\system32\cmd.exe /C "cd /D D:\a\llama.cpp\llama.cpp\build\ggml\src\ggml-vulkan\vulkan-shaders-gen-prefix\src\vulkan-shaders-gen-build && "C:\Program Files\CMake\bin\cmake.exe" --install . && "C:\Program Files\CMake\bin\cmake.exe" -E touch D:/a/llama.cpp/llama.cpp/build/ggml/src/ggml-vulkan/vulkan-shaders-gen-prefix/src/vulkan-shaders-gen-stamp/Release/vulkan-shaders-gen-install"
-- Install configuration: "Release"
CMake Error at cmake_install.cmake:50 (file):
  file INSTALL cannot find
  "D:/a/llama.cpp/llama.cpp/build/bin/Release/vulkan-shaders-gen.exe": No
  error.

Here is the full log: https://github.com/slaren/llama.cpp/actions/runs/14762355462/job/41445824619

jeffbolznv · 2025-04-30T21:44:00Z

OK, I've heard about this before, I think clang puts the exe in a different folder, we probably need some small change to the cmake file. I'll try to reproduce this soon.

slaren · 2025-04-30T21:49:01Z

I couldn't reproduce it locally. I suspect that it has something to do with this message while configuring cmake:

 -- Host compiler: C:/mingw64/bin/gcc.exe C:/mingw64/bin/g++.exe

It seems that it thinks that it is cross-compiling and uses a different compiler to build the shader-gen? Not sure what's going on there.

This reverts commit b5780af.

jeffbolznv · 2025-05-01T15:06:02Z

I tried building locally and while it eventually failed on some curl issue, it did get past the vulkan-shaders-gen part of the build.

Looking at the log again, I noticed this mismatch of Debug vs Release:

[2/2] Linking CXX executable D:\a\llama.cpp\llama.cpp\build\bin\Debug\vulkan-shaders-gen.exe

[84/271] Performing install step for 'vulkan-shaders-gen'
FAILED: ggml/src/ggml-vulkan/vulkan-shaders-gen-prefix/src/vulkan-shaders-gen-stamp/Release/vulkan-shaders-gen-install D:/a/llama.cpp/llama.cpp/build/ggml/src/ggml-vulkan/vulkan-shaders-gen-prefix/src/vulkan-shaders-gen-stamp/Release/vulkan-shaders-gen-install 
C:\Windows\system32\cmd.exe /C "cd /D D:\a\llama.cpp\llama.cpp\build\ggml\src\ggml-vulkan\vulkan-shaders-gen-prefix\src\vulkan-shaders-gen-build && "C:\Program Files\CMake\bin\cmake.exe" --install . && "C:\Program Files\CMake\bin\cmake.exe" -E touch D:/a/llama.cpp/llama.cpp/build/ggml/src/ggml-vulkan/vulkan-shaders-gen-prefix/src/vulkan-shaders-gen-stamp/Release/vulkan-shaders-gen-install"
-- Install configuration: "Release"
CMake Error at cmake_install.cmake:50 (file):
  file INSTALL cannot find
  "D:/a/llama.cpp/llama.cpp/build/bin/Release/vulkan-shaders-gen.exe": No
  error.

Maybe this issue is specific to the ninja multi-config?

slaren · 2025-05-01T22:54:19Z

One possible issue with this change that I didn't realize at first is that the examples that are not compatible with GGML_BACKEND_DL will no longer be included in the binary distribution. This includes these examples:

convert-llama2c-to-ggml
cvector-generator
export-lora
llava
rpc

The most impactful of these are likely to be the llava and the rpc server. cc @ngxson @rgerganov

Fixing this wouldn't be complicated. Essentially:

Add a call to ggml_backend_load_all on startup to load the backends
Use the backend registry instead of the backend-specific functions

rgerganov · 2025-05-02T05:27:22Z

The most impactful of these are likely to be the llava and the rpc server. cc @ngxson @rgerganov

I am traveling and won't be able to address this in the next few days, sorry. You can exclude rpc-server as stop-gap solution

* origin/master: (27 commits) llama : fix build_ffn without gate (ggml-org#13336) CUDA: fix bad asserts for partial offload (ggml-org#13337) convert : qwen2/3moe : set yarn metadata if present (ggml-org#13331) CUDA: fix --split-mode row for MMQ (ggml-org#13323) gguf-py : avoid requiring pyside6 for other scripts (ggml-org#13036) CUDA: fix logic for clearing padding with -ngl 0 (ggml-org#13320) sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (ggml-org#13264) server : Webui - change setText command from parent window to also send the message. (ggml-org#13309) mtmd : rename llava directory to mtmd (ggml-org#13311) clip : fix confused naming ffn_up and ffn_down (ggml-org#13290) convert : bailingmoe : set yarn metadata if present (ggml-org#13312) SYCL: Disable mul_mat kernels for noncontiguous tensor b (ggml-org#13308) mtmd : add C public API (ggml-org#13184) rpc : use backend registry, support dl backends (ggml-org#13304) ggml : activate s390x simd for Q3_K (ggml-org#13301) llava/mtmd : fixes to fully support dl backends (ggml-org#13303) llama : build windows releases with dl backends (ggml-org#13220) CUDA: fix race condition in MMQ stream-k fixup (ggml-org#13299) CUDA: fix race condition in MMQ ids_dst (ggml-org#13294) vulkan: Additional type support for unary, binary, and copy (ggml-org#13266) ...

llama : build windows releases with dl backends

6ae7a92

github-actions bot added build Compilation issues testing Everything test related devops improvements to build systems and github actions labels Apr 30, 2025

slaren added 2 commits May 1, 2025 00:02

do not change CMAKE_SYSTEM_PROCESSOR in toolchain file

b5780af

Revert "do not change CMAKE_SYSTEM_PROCESSOR in toolchain file"

bd129e9

This reverts commit b5780af.

slaren mentioned this pull request May 2, 2025

Feature Request: Installable package via winget #8188

Open

4 tasks

slaren merged commit 9f2da58 into master May 4, 2025
43 checks passed

slaren deleted the sl/backend-dl-releases branch May 4, 2025 12:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : build windows releases with dl backends #13220

llama : build windows releases with dl backends #13220

slaren commented Apr 30, 2025 •

edited

Loading

jeffbolznv commented Apr 30, 2025

slaren commented Apr 30, 2025 •

edited

Loading

jeffbolznv commented Apr 30, 2025

slaren commented Apr 30, 2025

jeffbolznv commented May 1, 2025

slaren commented May 1, 2025 •

edited

Loading

rgerganov commented May 2, 2025

llama : build windows releases with dl backends #13220

llama : build windows releases with dl backends #13220

Conversation

slaren commented Apr 30, 2025 • edited Loading

jeffbolznv commented Apr 30, 2025

slaren commented Apr 30, 2025 • edited Loading

jeffbolznv commented Apr 30, 2025

slaren commented Apr 30, 2025

jeffbolznv commented May 1, 2025

slaren commented May 1, 2025 • edited Loading

rgerganov commented May 2, 2025

slaren commented Apr 30, 2025 •

edited

Loading

slaren commented Apr 30, 2025 •

edited

Loading

slaren commented May 1, 2025 •

edited

Loading