Skip to content

Support task switches from command buffer callbacks #532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
maleadt opened this issue Feb 3, 2025 · 0 comments
Open

Support task switches from command buffer callbacks #532

maleadt opened this issue Feb 3, 2025 · 0 comments

Comments

@maleadt
Copy link
Member

maleadt commented Feb 3, 2025

In #521, code was added to the MTLCommandBuffer on_completed callback calling @error. That is not supported, in fact, any task switch from such a callback context results in a deadlock. Simply adding a print to the on_completed callback results in the following hang:

Main thread:

frame #0: 0x000000018ed0e6ec libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000018ed4c894 libsystem_pthread.dylib`_pthread_cond_wait + 1204
frame #2: 0x0000000199c0b19c Metal`-[_MTLCommandBuffer waitUntilCompleted] + 84
frame #3: 0x000000010331c074
frame #4: 0x000000010068d360 libjulia-internal.1.11.3.dylib`eval_stmt_value(stmt=<unavailable>, s=<unavailable>) at interpreter.c:174:23 [opt]
frame #5: 0x000000010068c018 libjulia-internal.1.11.3.dylib`eval_body(stmts=<unavailable>, s=0x000000016fdfd430, ip=<unavailable>, toplevel=<unavailable>) at interpreter.c:663:21 [opt]
frame #6: 0x000000010068c634 libjulia-internal.1.11.3.dylib`jl_interpret_toplevel_thunk(m=0x000000012e7c7180, src=<unavailable>) at interpreter.c:821:21 [opt]
frame #7: 0x00000001006a5c10 libjulia-internal.1.11.3.dylib`jl_toplevel_eval_flex(m=0x000000012e7c7180, e=0x000000010b807a50, fast=1, expanded=<unavailable>) at toplevel.c:943:18 [opt]
frame #8: 0x00000001006a5990 libjulia-internal.1.11.3.dylib`jl_toplevel_eval_flex(m=0x000000012e7c7180, e=0x000000010b807c50, fast=1, expanded=<unavailable>) at toplevel.c:886:19 [opt]
frame #9: 0x00000001006a6924 libjulia-internal.1.11.3.dylib`ijl_toplevel_eval_in [inlined] ijl_toplevel_eval(m=0x000000012e7c7180, v=0x000000010b807c50) at toplevel.c:952:12 [opt]
frame #10: 0x00000001006a6918 libjulia-internal.1.11.3.dylib`ijl_toplevel_eval_in(m=0x000000012e7c7180, ex=0x000000010b807c50) at toplevel.c:994:13 [opt]
frame #11: 0x000000012ad3b620 sys.dylib`japi1_include_string_72509.1 at boot.jl:430
frame #12: 0x000000012a99be94 sys.dylib`japi1__include_72518.1 at loading.jl:2794
frame #13: 0x000000012b399488 sys.dylib`julia_include_47041.1 at Base.jl:557
frame #14: 0x000000012ac7b16c sys.dylib`jfptr_include_47042.1 + 60
frame #15: 0x000000012b2fed74 sys.dylib`julia_exec_options_73759.1 at client.jl:323
frame #16: 0x000000012ac02368 sys.dylib`julia__start_73899.1 at client.jl:531
frame #17: 0x000000012af915dc sys.dylib`jfptr__start_73900.1 + 44
frame #18: 0x00000001006d73bc libjulia-internal.1.11.3.dylib`true_main [inlined] jl_apply(args=0x000000016fdfe560, nargs=1) at julia.h:2157:12 [opt]
frame #19: 0x00000001006d73b0 libjulia-internal.1.11.3.dylib`true_main(argc=<unavailable>, argv=<unavailable>) at jlapi.c:900:29 [opt]
frame #20: 0x00000001006d72c8 libjulia-internal.1.11.3.dylib`jl_repl_entrypoint(argc=<unavailable>, argv=<unavailable>) at jlapi.c:1059:15 [opt]
frame #21: 0x0000000100003f6c julia`main + 12
frame #22: 0x000000018e9cc274 dyld`start + 2840

Callback:

frame #0: 0x000000018ed0e6ec libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000018ed4c894 libsystem_pthread.dylib`_pthread_cond_wait + 1204
frame #2: 0x000000010074d678 libjulia-internal.1.11.3.dylib`uv_cond_wait(cond=0x0000000121073e28, mutex=0x0000000121073de8) at thread.c:806:7
frame #3: 0x00000001006bf200 libjulia-internal.1.11.3.dylib`ijl_task_get_next(trypoptask=0x000000012b9a1890, q=0x00000001726f4d90, checkempty=0x000000012bbd4190) at scheduler.c:584:21 [opt]
frame #4: 0x000000012ac43e94 sys.dylib`julia_poptask_66995.1 at task.jl:1012
frame #5: 0x000000012af9ebc0 sys.dylib`julia_wait_66501.1 at task.jl:1021
frame #6: 0x000000016a9a8130
frame #7: 0x000000016a98c12c
frame #8: 0x0000000199c06d7c Metal`-[_MTLCommandBuffer didCompleteWithStartTime:endTime:error:] + 608
frame #9: 0x00000001afd213fc IOGPU`-[IOGPUMetalCommandBuffer didCompleteWithStartTime:endTime:error:] + 216
frame #10: 0x0000000199c069c0 Metal`-[_MTLCommandQueue commandBufferDidComplete:startTime:completionTime:error:] + 108
frame #11: 0x00000001afd2e500 IOGPU`IOGPUNotificationQueueDispatchAvailableCompletionNotifications + 136
frame #12: 0x00000001afd2e610 IOGPU`__IOGPUNotificationQueueSetDispatchQueue_block_invoke + 64
frame #13: 0x000000018eb99674 libdispatch.dylib`_dispatch_client_callout4 + 20
frame #14: 0x000000018ebb5c88 libdispatch.dylib`_dispatch_mach_msg_invoke + 464
frame #15: 0x000000018eba0a38 libdispatch.dylib`_dispatch_lane_serial_drain + 352
frame #16: 0x000000018ebb69dc libdispatch.dylib`_dispatch_mach_invoke + 456
frame #17: 0x000000018eba0a38 libdispatch.dylib`_dispatch_lane_serial_drain + 352
frame #18: 0x000000018eba1764 libdispatch.dylib`_dispatch_lane_invoke + 432
frame #19: 0x000000018eba0a38 libdispatch.dylib`_dispatch_lane_serial_drain + 352
frame #20: 0x000000018eba1730 libdispatch.dylib`_dispatch_lane_invoke + 380
frame #21: 0x000000018ebac9a0 libdispatch.dylib`_dispatch_root_queue_drain_deferred_wlh + 288
frame #22: 0x000000018ebac1ec libdispatch.dylib`_dispatch_workloop_worker_thread + 540
frame #23: 0x000000018ed483d8 libsystem_pthread.dylib`_pthread_wqthread + 288

The issue is that the callback wants to switch tasks while the main thread is blocked within a system library. To demonstrate, @gbaraldi suggested launching the kernel from a separate thread (Threads.@spawn @metal ...) resulting in the main thread not being stuck so that the callback can still switch tasks.

In fact, this may suggest that the real fix here is to ensure we don't do blocking synchronization on thread 0 (the I/O thread), instead of changing how the callback work. Because of that, I'm filing the issue here rather than on ObjectiveC.jl. Such nonblocking synchronization could be ported over from CUDA.jl, where we already support doing this without blocking in libcuda.so, instead relying on a pool of worker threads.

@maleadt maleadt mentioned this issue Feb 10, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant