Skip to content

xe: sdpa: add support for reusable sdpa #3322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

syurkevi
Copy link
Contributor

Description

This PR introduces support for reusable sdpa so recompilation can be skipped for different sequence and query lengths. Head size is still baked into the kernel. The configuration for microkernel headers has been extracted and made part of the headers since this must now be done ahead of execution for caching.

Checklist

General

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • Have you formatted the code using clang-format?

@syurkevi syurkevi requested review from a team as code owners May 23, 2025 03:29
@github-actions github-actions bot added platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel component:tests Codeowner: @oneapi-src/onednn-arch labels May 23, 2025
@syurkevi syurkevi force-pushed the syurkevi/reusable_sdpa branch from cee7f45 to 66a9d8b Compare May 29, 2025 03:31
@syurkevi syurkevi changed the title xe: sdpa: add support for reusable sdpa [WIP] xe: sdpa: add support for reusable sdpa May 29, 2025
@syurkevi syurkevi requested a review from umar456 May 29, 2025 03:33
@syurkevi
Copy link
Contributor Author

<3% geomean regression from reusable params in kernel.
Example of reusable cache hit:

onednn_verbose,v1,primitive,create:cache_miss,gpu,sdpa,ocl:micro:any,undef,query:f16::blocked:abcd::f0 key:f16::blocked:abdc::f0 val:f16::blocked:abcd::f0 msk:f16::blocked:abcd::f0 dst:f16::blocked:abcd::f0,,msk:1d,1x4x1x128:1x4x128x386:1x4x386x128,6.40015

onednn_verbose,v1,primitive,create:kernel_cache_hit,gpu,sdpa,ocl:micro:any,undef,query:f16::blocked:abcd::f0 key:f16::blocked:abdc::f0 val:f16::blocked:abcd::f0 msk:f16::blocked:abcd::f0 dst:f16::blocked:abcd::f0,,msk:1d,1x4x1x128:1x4x128x391:1x4x391x128,0.00683594

@syurkevi
Copy link
Contributor Author

make test
disable benchdnn_all
enable benchdnn_graph
enable test_device_gpu
enable arch_gpu_xe-hpc
enable arch_gpu_xe-hpg-atsm
enable arch_gpu_xe-hpg-dg2
enable arch_gpu_xe-lpg+
enable arch_gpu_xe2-hpg-bmg
enable arch_gpu_xe2-lpg

@syurkevi syurkevi force-pushed the syurkevi/reusable_sdpa branch from 66a9d8b to f1f7cce Compare May 29, 2025 17:14
@syurkevi
Copy link
Contributor Author

make test
disable benchdnn_all
enable benchdnn_graph
enable test_device_gpu
enable arch_gpu_xe-hpc
enable arch_gpu_xe-hpg-atsm
enable arch_gpu_xe-hpg-dg2
enable arch_gpu_xe-lpg+
enable arch_gpu_xe2-hpg-bmg
enable arch_gpu_xe2-lpg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:tests Codeowner: @oneapi-src/onednn-arch platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant