-
Notifications
You must be signed in to change notification settings - Fork 12.2k
vulkan: lock accesses of pinned_memory vector #14333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jeffbolznv Not sure if this is related, but the ubuntu-22-cmake-vulkan is seeing a CI test failure in I will try and run the test locally and see what happens for me out of curiosity—though perhaps not of any help to fixing the issue. 😅 |
I haven't been able to reproduce the crash using lavapipe on WSL. If somebody is able to get a callstack where it crashes, that would be helpful. |
Waiting to see if that Ubuntu 22 run fails again—and if so, I will create a temporary workflow that uploads all its assets and a crash dump. |
It succeeded this time so looks a bit random. It would be great if the CI uploads artifacts and core dump when that happens. I can give that a go but it'll probably take me a couple days working a little bit at a time. |
I'm thinking |
@jeffbolznv I managed to create a crash dump on the master branch. Because the file is so large, I just did a backtrace on all the threads to start. Here is the gist file with details. https://gist.github.com/bandoti/b229ea2a8fced26f01b112834c826150 I used WSL2 and ran the following commands (mimicking the CI test):
The crash occurred the first time, but not the second time I ran the executable. If you would like more info from the crash dump, please let me know and I'll fetch it with gdb. |
Did it say which thread crashed? Thread 1 is hitting the issue this PR fixes, so maybe it's the same bug. I tried running with valgrind. There are a ton of warnings coming from lavapipe's use of LLVM. I'm not sure whether those are legitimate, but it does make me thing we ought to mutex ggml_vk_load_shaders since it's setting all the pipeline pointers in the device. I'll do that in a separate change. |
Yeah, thread 1 is crashing. I should probably pull this branch instead—but only saw the crash on the master so far. I find that AI is really great when it comes to parsing gdb/valgrind output! 😊 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I wasn't aware of recursive mutexes, that's a cool feature. Always good to learn new stuff.
I have not been able to reproduce a crash in test-thread-safety, but I did see
WARNING: failed to free pinned memory: memory not in map
which points to lack of locking accesses to the pinned memory array.