Skip to content

Commit 5b0b8d8

Browse files
authored
sycl : Reenabled mmvq path for the SYCL Nvidia Backend (ggml-org#8372)
* SYCL : Reenabled mmvq path for the SYCL Nvidia Backend * Reduced verbosity of comment
1 parent 9925ca4 commit 5b0b8d8

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

ggml/src/ggml-sycl.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3658,6 +3658,10 @@ static void ggml_sycl_mul_mat(ggml_backend_sycl_context & ctx, const ggml_tensor
36583658
use_mul_mat_q = use_mul_mat_q && (src1->ne[1] <= MMQ_MAX_BATCH_SIZE);
36593659
#endif // SYCL_USE_XMX
36603660

3661+
// mmvq path is faster in the CUDA backend.
3662+
if (ctx.stream()->get_backend() == sycl::backend::ext_oneapi_cuda)
3663+
use_dequantize_mul_mat_vec = use_dequantize_mul_mat_vec && !use_mul_mat_vec_q;
3664+
36613665
if (!split && src0->type == GGML_TYPE_F16 && ggml_is_permuted(src0) && ggml_is_permuted(src1) && src1->ne[1] == 1) {
36623666
// KQ single-batch
36633667
ggml_sycl_mul_mat_vec_p021(ctx, src0, src1, dst);

0 commit comments

Comments
 (0)