Skip to content

unsafe_free! is not thread-safe #503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pxl-th opened this issue Dec 6, 2023 · 5 comments · Fixed by #511
Closed

unsafe_free! is not thread-safe #503

pxl-th opened this issue Dec 6, 2023 · 5 comments · Fixed by #511

Comments

@pxl-th
Copy link
Member

pxl-th commented Dec 6, 2023

I have encountered the following error several times both with CUDA.jl and AMDGPU.jl.
I guess GPUArrays.unsafe_free! is not thread safe.
Doesn't cause big issues, since this happens in the finalizer, but still...

MWE that produces it sometimes:

using AMDGPU

function main()
    for i in 1:100
        @show i
        x = AMDGPU.ones(Float32, 32 * 1024 * 1024) # 128 MiB
        y = sin.(x)
        AMDGPU.unsafe_free!(y)
        AMDGPU.unsafe_free!(x)
    end
end
main()
error in running finalizer: ArgumentError(msg="Attempt to release freed data.")
release at /home/pxl-th/.julia/packages/GPUArrays/dAUOE/src/host/abstractarray.jl:38
unsafe_free! at /home/pxl-th/.julia/packages/GPUArrays/dAUOE/src/host/abstractarray.jl:90 [inlined]
unsafe_finalize! at /home/pxl-th/.julia/dev/AMDGPU/src/array.jl:35
unknown function (ip: 0x7fd0b6fe15e5)
_jl_invoke at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/gf.c:3076
run_finalizer at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/gc.c:316
jl_gc_run_finalizers_in_list at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/gc.c:408
run_finalizers at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/gc.c:452
jl_mutex_unlock at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/julia_locks.h:80 [inlined]
jl_generate_fptr_impl at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/jitlayers.cpp:525
jl_compile_method_internal at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/gf.c:2480 [inlined]
jl_compile_method_internal at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/gf.c:2368
_jl_invoke at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/gf.c:2886 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/gf.c:3076
release at /home/pxl-th/.julia/packages/GPUArrays/dAUOE/src/host/abstractarray.jl:42
unsafe_free! at /home/pxl-th/.julia/packages/GPUArrays/dAUOE/src/host/abstractarray.jl:90 [inlined]
unsafe_free! at /home/pxl-th/.julia/dev/AMDGPU/src/array.jl:34 [inlined]
main at /home/pxl-th/code/Nerf.jl/benchmark/pipeline.jl:62
unknown function (ip: 0x7fd0b6f23be2)
_jl_invoke at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_call at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/interpreter.c:126
eval_value at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/interpreter.c:617
jl_interpret_toplevel_thunk at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/interpreter.c:775
top-level scope at /home/pxl-th/code/Nerf.jl/benchmark/pipeline.jl:66
jl_toplevel_eval_flex at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/toplevel.c:985
eval at ./boot.jl:385 [inlined]
include_string at ./loading.jl:2070
_jl_invoke at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/gf.c:3076
_include at ./loading.jl:2130
include at ./Base.jl:495
jfptr_include_46402.1 at /home/pxl-th/bin/julia-1.10.0-rc2/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/gf.c:3076
exec_options at ./client.jl:318
_start at ./client.jl:552
jfptr__start_82776.1 at /home/pxl-th/bin/julia-1.10.0-rc2/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
true_main at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/jlapi.c:582
jl_repl_entrypoint at /cache/build/builder-amdci5-4/julialang/julia-release-1-dot-10/src/jlapi.c:731
main at julia (unknown line)
unknown function (ip: 0x7fd0b8229d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
@maleadt
Copy link
Member

maleadt commented Jan 9, 2024

Where are threads being used here?

FWIW, GPUArray objects are not thread safe (neither are operations on Base.Array objects, or Dicts, etc). If you're using the same object in parallel, you should guard it with a lock. Operations on the (possibly shared) DataRef that's pointed to array objects is supposed to be thread safe.

@pxl-th
Copy link
Member Author

pxl-th commented Jan 9, 2024

Since this happens in the finalizer I though the other thread is where the GC runs.
Then when manually invoking unsafe_free!, and if GC kicks in at the same time, they both can go to the release function.

@maleadt
Copy link
Member

maleadt commented Jan 9, 2024

There are no threads involved here. The problem is that during execution of the manual free, a finalizer is kicked off. I think we just need to switch the order of setting arr.freed.

@maleadt
Copy link
Member

maleadt commented Jan 9, 2024

Can you check if #503 works?

@pxl-th
Copy link
Member Author

pxl-th commented Jan 9, 2024

Strangely I can't reproduce this issue anymore. I was using 1.10 beta back then and now I'm on the released version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants