Skip to content

[SYCL] Avoid infinite loop when kernel fails to compile with memory error #18888

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 20, 2025

Conversation

VerenaBeckham
Copy link
Contributor

The runtime tries to build a kernel again if compilation fails. But if UR returns a memory error the attempt counter was not compared against the maximum number of attempts, so the compiler was continuously called and eventually the loop counter would have overflowed.

…rror

The runtime tries to build a kernel again if compilation fails. But if
UR returns a memory error the attempt counter was not compared against
the maximum number of attempts, so the compiler was continuously called
and eventually the loop counter would have overflowed.
@VerenaBeckham VerenaBeckham requested a review from a team as a code owner June 10, 2025 12:04
@maarquitos14
Copy link
Contributor

@VerenaBeckham is it possible to add a test that would fail without your patch and work after your patch?

@VerenaBeckham
Copy link
Contributor Author

@VerenaBeckham is it possible to add a test that would fail without your patch and work after your patch?

I'm not sure. Even if you could write a test that allocates so much memory that it runs out, this test might still pass on a different architecture with more memory. Or do you know of a way to mock up this error?

@maarquitos14
Copy link
Contributor

maarquitos14 commented Jun 11, 2025

@VerenaBeckham is it possible to add a test that would fail without your patch and work after your patch?

I'm not sure. Even if you could write a test that allocates so much memory that it runs out, this test might still pass on a different architecture with more memory. Or do you know of a way to mock up this error?

I think a unit test with UrMock should do the trick for that: https://github.com/intel/llvm/blob/b437083ea7b827ea6798e3fcfae1fe0f926d1d6d/sycl/doc/developer/ContributeToDPCPP.md#dpc-headers-and-runtime-tests

You can find examples if you grep UrMock within sycl/unittests. Basically, UrMock allows you to redefine the behavior of any UR function for a given test, so you could redefine the function that is triggering the memory error to always return a memory error without even trying to allocate.

@VerenaBeckham
Copy link
Contributor Author

@VerenaBeckham is it possible to add a test that would fail without your patch and work after your patch?

I'm not sure. Even if you could write a test that allocates so much memory that it runs out, this test might still pass on a different architecture with more memory. Or do you know of a way to mock up this error?

I think a unit test with UrMock should do the trick for that: https://github.com/intel/llvm/blob/b437083ea7b827ea6798e3fcfae1fe0f926d1d6d/sycl/doc/developer/ContributeToDPCPP.md#dpc-headers-and-runtime-tests

You can find examples if you grep UrMock within sycl/unittests. Basically, UrMock allows you to redefine the behavior of any UR function for a given test, so you could redefine the function that is triggering the memory error to always return a memory error without even trying to allocate.

Thanks for the hint! I have added a test.

Copy link
Contributor

@maarquitos14 maarquitos14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of nits related to LLVM Coding Style, otherwise LGTM.

@VerenaBeckham
Copy link
Contributor Author

If you're happy with this @maarquitos14 do you want to "approve the workflows" and kick off the testing?

@VerenaBeckham
Copy link
Contributor Author

VerenaBeckham commented Jun 19, 2025

I'm sorry, I forgot to push my clang format fixes. 🤦 Fixed now.

@againull againull merged commit 8d55836 into intel:sycl Jun 20, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants