Skip to content

sycl: Fix conditional enabling following arch checks for ggml-sycl #14504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 3, 2025

Conversation

s-Nick
Copy link
Collaborator

@s-Nick s-Nick commented Jul 2, 2025

PR #13973 wanted to enable optimization for intel devices by default. Due to a small boolean bug it was always disabled even with GGML_SYCL_DISABLE_OPT=0. This PR fixes it.

Performance comparison on Intel B580

model size params backend ngl sm test master t/s 9edb916 t/s
qwen2 1.5B Q4_0 1013.62 MiB 1.78 B SYCL 99 none pp512 8545.67 ± 41.26 8559.06 ± 39.47
qwen2 1.5B Q4_0 1013.62 MiB 1.78 B SYCL 99 none tg128 110.97 ± 0.14 157.52 ± 0.53
qwen2 1.5B Q4_K - Medium 1.04 GiB 1.78 B SYCL 99 none pp512 8653.84 ± 31.87 8675.45 ± 75.79
qwen2 1.5B Q4_K - Medium 1.04 GiB 1.78 B SYCL 99 none tg128 100.36 ± 0.11 137.52 ± 0.20
llama 7B Q4_0 3.57 GiB 6.74 B SYCL 99 none pp512 2249.87 ± 3.22 2261.56 ± 4.04
llama 7B Q4_0 3.57 GiB 6.74 B SYCL 99 none tg128 41.60 ± 0.17 73.40 ± 0.26
llama 7B Q4_K - Medium 3.80 GiB 6.74 B SYCL 99 none pp512 2291.22 ± 1.64 2310.27 ± 4.83
llama 7B Q4_K - Medium 3.80 GiB 6.74 B SYCL 99 none tg128 33.19 ± 0.14 59.42 ± 0.54
gemma2 2B Q4_K - Medium 1.59 GiB 2.61 B SYCL 99 none pp512 6306.60 ± 17.54 6306.17 ± 23.65
gemma2 2B Q4_K - Medium 1.59 GiB 2.61 B SYCL 99 none tg128 70.01 ± 0.77 103.74 ± 0.12
phi3 3B Q4_0 2.03 GiB 3.82 B SYCL 99 none pp512 3389.80 ± 2.88 3412.82 ± 7.16
phi3 3B Q4_0 2.03 GiB 3.82 B SYCL 99 none tg128 66.12 ± 0.43 107.76 ± 0.30
phi3 3B Q4_K - Medium 2.23 GiB 3.82 B SYCL 99 none pp512 3527.64 ± 7.33 3540.96 ± 9.11
phi3 3B Q4_K - Medium 2.23 GiB 3.82 B SYCL 99 none tg128 53.77 ± 0.37 79.59 ± 0.36
llama 34B Q6_K 8.20 GiB 10.73 B SYCL 99 none pp512 1573.21 ± 2.19 1575.07 ± 1.99
llama 34B Q6_K 8.20 GiB 10.73 B SYCL 99 none tg128 21.06 ± 0.04 23.74 ± 0.06

@s-Nick s-Nick requested a review from Alcpz July 2, 2025 13:55
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Jul 2, 2025
@s-Nick s-Nick merged commit 7b63a71 into ggml-org:master Jul 3, 2025
47 of 48 checks passed
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Jul 3, 2025
* origin/master:
Fix conditional enabling following arch checks for ggml-sycl (ggml-org#14504)
convert : correct gemma 3n conversion (ggml-org#14450)
kv-cache : use ggml_set_rows (ggml-org#14285)
ggml : fix FA mask dim 2 and 3 (ggml-org#14505)
ggml : remove kompute backend (ggml-org#14501)
CUDA: add dynamic shared mem to softmax, refactor general usage (ggml-org#14497)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants