-
Notifications
You must be signed in to change notification settings - Fork 11.6k
llama : build windows releases with dl backends #13220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can you provide more details on the clang Vulkan issue and how to reproduce it (or maybe file an issue)? Did you end up just using msvc for Vulkan instead? |
Yes, it is still building the Vulkan release with msvc, same as before, but (at least) it also has the multiple CPU variants which should give it better compatibility with different CPUs. When I tried to build Vulkan with clang it failed with this error:
Here is the full log: https://github.com/slaren/llama.cpp/actions/runs/14762355462/job/41445824619 |
OK, I've heard about this before, I think clang puts the exe in a different folder, we probably need some small change to the cmake file. I'll try to reproduce this soon. |
I couldn't reproduce it locally. I suspect that it has something to do with this message while configuring cmake:
It seems that it thinks that it is cross-compiling and uses a different compiler to build the shader-gen? Not sure what's going on there. |
I tried building locally and while it eventually failed on some curl issue, it did get past the vulkan-shaders-gen part of the build. Looking at the log again, I noticed this mismatch of Debug vs Release:
Maybe this issue is specific to the ninja multi-config? |
One possible issue with this change that I didn't realize at first is that the examples that are not compatible with
The most impactful of these are likely to be the llava and the rpc server. cc @ngxson @rgerganov Fixing this wouldn't be complicated. Essentially:
|
I am traveling and won't be able to address this in the next few days, sorry. You can exclude rpc-server as stop-gap solution |
* origin/master: (27 commits) llama : fix build_ffn without gate (ggml-org#13336) CUDA: fix bad asserts for partial offload (ggml-org#13337) convert : qwen2/3moe : set yarn metadata if present (ggml-org#13331) CUDA: fix --split-mode row for MMQ (ggml-org#13323) gguf-py : avoid requiring pyside6 for other scripts (ggml-org#13036) CUDA: fix logic for clearing padding with -ngl 0 (ggml-org#13320) sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (ggml-org#13264) server : Webui - change setText command from parent window to also send the message. (ggml-org#13309) mtmd : rename llava directory to mtmd (ggml-org#13311) clip : fix confused naming ffn_up and ffn_down (ggml-org#13290) convert : bailingmoe : set yarn metadata if present (ggml-org#13312) SYCL: Disable mul_mat kernels for noncontiguous tensor b (ggml-org#13308) mtmd : add C public API (ggml-org#13184) rpc : use backend registry, support dl backends (ggml-org#13304) ggml : activate s390x simd for Q3_K (ggml-org#13301) llava/mtmd : fixes to fully support dl backends (ggml-org#13303) llama : build windows releases with dl backends (ggml-org#13220) CUDA: fix race condition in MMQ stream-k fixup (ggml-org#13299) CUDA: fix race condition in MMQ ids_dst (ggml-org#13294) vulkan: Additional type support for unary, binary, and copy (ggml-org#13266) ...
Changes:
GGML_BACKEND_DL
andGGML_CPU_ALL_VARIANTS
to build the windows releases to enable dynamic loading of backendsevict-old-files
test-quantize-stats.cpp
withGGML_BACKEND_DL
-march=native
from llvm cmake toolchain fileNotes:
Test run: https://github.com/slaren/llama.cpp/actions/runs/14762791544/job/41447243958
Test release: https://github.com/slaren/llama.cpp/releases/tag/b5235