vulkan: lock accesses of pinned_memory vector #14333

jeffbolznv · 2025-06-22T20:47:27Z

I have not been able to reproduce a crash in test-thread-safety, but I did see WARNING: failed to free pinned memory: memory not in map which points to lack of locking accesses to the pinned memory array.

bandoti · 2025-06-23T17:06:51Z

@jeffbolznv Not sure if this is related, but the ubuntu-22-cmake-vulkan is seeing a CI test failure in test-thread-safety. This might provide a pathway to reproduce it—running on Ubuntu that is.

I will try and run the test locally and see what happens for me out of curiosity—though perhaps not of any help to fixing the issue. 😅

jeffbolznv · 2025-06-23T17:32:43Z

I haven't been able to reproduce the crash using lavapipe on WSL. If somebody is able to get a callstack where it crashes, that would be helpful.

bandoti · 2025-06-23T17:44:43Z

Waiting to see if that Ubuntu 22 run fails again—and if so, I will create a temporary workflow that uploads all its assets and a crash dump.

bandoti · 2025-06-23T18:37:35Z

It succeeded this time so looks a bit random. It would be great if the CI uploads artifacts and core dump when that happens. I can give that a go but it'll probably take me a couple days working a little bit at a time.

bandoti · 2025-06-23T18:46:27Z

I'm thinking valgrind might be useful to help us catch this one.

bandoti · 2025-06-23T20:05:50Z

@jeffbolznv I managed to create a crash dump on the master branch. Because the file is so large, I just did a backtrace on all the threads to start. Here is the gist file with details.

https://gist.github.com/bandoti/b229ea2a8fced26f01b112834c826150

I used WSL2 and ran the following commands (mimicking the CI test):

git clone [email protected]:bandoti/llama.cpp.git
cd llama.cpp/
cmake -B build -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Debug
cmake --build build --config Debug
export GGML_VK_VISIBLE_DEVICES=0
./build/bin/test-thread-safety -hf ggml-org/models -hff tinyllamas/stories15M-q4_0.gguf -ngl 99 -p "The meaning of life is" -n 128 -c 256 -ub 32 -np 4

The crash occurred the first time, but not the second time I ran the executable. If you would like more info from the crash dump, please let me know and I'll fetch it with gdb.

jeffbolznv · 2025-06-23T20:18:43Z

Did it say which thread crashed? Thread 1 is hitting the issue this PR fixes, so maybe it's the same bug.

I tried running with valgrind. There are a ton of warnings coming from lavapipe's use of LLVM. I'm not sure whether those are legitimate, but it does make me thing we ought to mutex ggml_vk_load_shaders since it's setting all the pipeline pointers in the device. I'll do that in a separate change.

bandoti · 2025-06-23T20:34:29Z

Yeah, thread 1 is crashing. I should probably pull this branch instead—but only saw the crash on the master so far.

I find that AI is really great when it comes to parsing gdb/valgrind output! 😊

0cc4m

LGTM

I wasn't aware of recursive mutexes, that's a cool feature. Always good to learn new stuff.

vulkan: lock accesses of pinned_memory vector

80b6905

jeffbolznv requested a review from 0cc4m June 22, 2025 20:47

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jun 22, 2025

0cc4m approved these changes Jun 28, 2025

View reviewed changes

0cc4m merged commit 00d5282 into ggml-org:master Jun 28, 2025
47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: lock accesses of pinned_memory vector #14333

vulkan: lock accesses of pinned_memory vector #14333

Uh oh!

jeffbolznv commented Jun 22, 2025

Uh oh!

bandoti commented Jun 23, 2025 •

edited

Loading

Uh oh!

jeffbolznv commented Jun 23, 2025

Uh oh!

bandoti commented Jun 23, 2025

Uh oh!

bandoti commented Jun 23, 2025

Uh oh!

bandoti commented Jun 23, 2025

Uh oh!

bandoti commented Jun 23, 2025

Uh oh!

jeffbolznv commented Jun 23, 2025

Uh oh!

bandoti commented Jun 23, 2025 •

edited

Loading

Uh oh!

0cc4m left a comment

Uh oh!

Uh oh!

Uh oh!

vulkan: lock accesses of pinned_memory vector #14333

vulkan: lock accesses of pinned_memory vector #14333

Uh oh!

Conversation

jeffbolznv commented Jun 22, 2025

Uh oh!

bandoti commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffbolznv commented Jun 23, 2025

Uh oh!

bandoti commented Jun 23, 2025

Uh oh!

bandoti commented Jun 23, 2025

Uh oh!

bandoti commented Jun 23, 2025

Uh oh!

bandoti commented Jun 23, 2025

Uh oh!

jeffbolznv commented Jun 23, 2025

Uh oh!

bandoti commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0cc4m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bandoti commented Jun 23, 2025 •

edited

Loading

bandoti commented Jun 23, 2025 •

edited

Loading