GPU compiler error in tendency computation using `DiscreteForcing` with GPU + `Float32` + immersed `RectilinearGrid` #4192

ali-ramadhan · 2025-03-10T20:44:40Z

This might be similar to issue #4165 but this time the GPU compiler error is in the tendency computation, and specifically div_𝐯u.

Going to the CPU, or switching to Float64, or to LatitudeLongitudeGrid, or a not immersed RectilinearGrid causes the MWE to work and not produce an error. So the error only comes up in this very specific configuration.

MWE:

using Oceananigans

underlying_grid = RectilinearGrid(GPU(), Float32;
    topology = (Bounded, Bounded, Bounded),
    size = (10, 10, 10),
    x = (0, 1),
    y = (0, 1),
    z = (-1, 0)
)

height = 1/5
width = 1/5
mount(x, y) = height * exp(-x^2 / 2width^2) * exp(-y^2 / 2width^2)
bottom(x, y) = -1 + mount(x, y)

grid = ImmersedBoundaryGrid(underlying_grid, GridFittedBottom(bottom))

@inline relax(i, j, k, grid, clock, fields, p) = - p.rate * (fields.u[i, j, k] - p.u★)

params = (
    rate = 1.0,
    u★ = 0.0
)

u_forcing = Forcing(relax; discrete_form=true, parameters=params)

forcing = (;
    u = u_forcing
)

model = NonhydrostaticModel(; grid, forcing)

simulation = Simulation(model, Δt=0.01, stop_iteration=1)

run!(simulation)

Error:

ERROR: InvalidIRError: compiling MethodInstance for Oceananigans.Models.NonhydrostaticModels.gpu_compute_Gu!(::KernelAbstractions.CompilerMetadata{…}, ::OffsetArrays.OffsetArray{…}, ::ImmersedBoundaryGrid{…}, ::Nothing, ::Tuple{…}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to +)
Stacktrace:
 [1] div_𝐯u
   @ ~/atdepth/Oceananigans.jl/src/Advection/momentum_advection_operators.jl:47
 [2] u_velocity_tendency
   @ ~/atdepth/Oceananigans.jl/src/Models/NonhydrostaticModels/nonhydrostatic_tendency_kernel_functions.jl:68
 [3] gpu_compute_Gu!
   @ ~/.julia/packages/KernelAbstractions/sWSE0/src/macros.jl:322
 [4] gpu_compute_Gu!
   @ ./none:0
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/mgx54/src/validation.jl:167
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/mgx54/src/driver.jl:382 [inlined]
  [3] emit_llvm(job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/mgx54/src/utils.jl:110
  [4] emit_llvm
    @ ~/.julia/packages/GPUCompiler/mgx54/src/utils.jl:108 [inlined]
  [5] compile_unhooked(output::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/mgx54/src/driver.jl:95
  [6] compile_unhooked
    @ ~/.julia/packages/GPUCompiler/mgx54/src/driver.jl:80 [inlined]
  [7] compile(target::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/mgx54/src/driver.jl:67
  [8] compile
    @ ~/.julia/packages/GPUCompiler/mgx54/src/driver.jl:55 [inlined]
  [9] #1171
    @ ~/.julia/packages/CUDA/jkvdc/src/compiler/compilation.jl:255 [inlined]
 [10] JuliaContext(f::CUDA.var"#1171#1174"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}}; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/mgx54/src/driver.jl:34
 [11] JuliaContext(f::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/mgx54/src/driver.jl:25
 [12] compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/jkvdc/src/compiler/compilation.jl:254
 [13] actual_compilation(cache::Dict{Any, CUDA.CuFunction}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/mgx54/src/execution.jl:245
 [14] cached_compilation(cache::Dict{Any, CUDA.CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/mgx54/src/execution.jl:159
 [15] macro expansion
    @ ~/.julia/packages/CUDA/jkvdc/src/compiler/execution.jl:373 [inlined]
 [16] macro expansion
    @ ./lock.jl:267 [inlined]
 [17] cufunction(f::typeof(Oceananigans.Models.NonhydrostaticModels.gpu_compute_Gu!), tt::Type{Tuple{KernelAbstractions.CompilerMetadata{…}, OffsetArrays.OffsetArray{…}, ImmersedBoundaryGrid{…}, Nothing, Tuple{…}}}; kwargs::@Kwargs{always_inline::Bool, maxthreads::Int64})
    @ CUDA ~/.julia/packages/CUDA/jkvdc/src/compiler/execution.jl:368
 [18] macro expansion
    @ ~/.julia/packages/CUDA/jkvdc/src/compiler/execution.jl:112 [inlined]
 [19] (::KernelAbstractions.Kernel{…})(::Field{…}, ::Vararg{…}; ndrange::Nothing, workgroupsize::Nothing)
    @ CUDA.CUDAKernels ~/.julia/packages/CUDA/jkvdc/src/CUDAKernels.jl:103
 [20] (::KernelAbstractions.Kernel{…})(::Field{…}, ::Vararg{…})
    @ CUDA.CUDAKernels ~/.julia/packages/CUDA/jkvdc/src/CUDAKernels.jl:89
 [21] _launch!(::GPU{…}, ::ImmersedBoundaryGrid{…}, ::Symbol, ::Function, ::Field{…}, ::ImmersedBoundaryGrid{…}, ::Vararg{…}; exclude_periphery::Bool, reduced_dimensions::Tuple{}, active_cells_map::Nothing)
    @ Oceananigans.Utils ~/atdepth/Oceananigans.jl/src/Utils/kernel_launching.jl:298
 [22] _launch!
    @ ~/atdepth/Oceananigans.jl/src/Utils/kernel_launching.jl:275 [inlined]
 [23] launch!
    @ ~/atdepth/Oceananigans.jl/src/Utils/kernel_launching.jl:258 [inlined]
 [24] #compute_interior_tendency_contributions!#17
    @ ~/atdepth/Oceananigans.jl/src/Models/NonhydrostaticModels/compute_nonhydrostatic_tendencies.jl:105 [inlined]
 [25] compute_interior_tendency_contributions!
    @ ~/atdepth/Oceananigans.jl/src/Models/NonhydrostaticModels/compute_nonhydrostatic_tendencies.jl:57 [inlined]
 [26] compute_tendencies!(model::NonhydrostaticModel{…}, callbacks::Vector{…})
    @ Oceananigans.Models.NonhydrostaticModels ~/atdepth/Oceananigans.jl/src/Models/NonhydrostaticModels/compute_nonhydrostatic_tendencies.jl:35
 [27] #apply_regionally!#56
    @ ~/atdepth/Oceananigans.jl/src/Utils/multi_region_transformation.jl:121 [inlined]
 [28] apply_regionally!
    @ ~/atdepth/Oceananigans.jl/src/Utils/multi_region_transformation.jl:118 [inlined]
 [29] macro expansion
    @ ~/atdepth/Oceananigans.jl/src/Utils/multi_region_transformation.jl:206 [inlined]
 [30] update_state!(model::NonhydrostaticModel{…}, callbacks::Vector{…}; compute_tendencies::Bool)
    @ Oceananigans.Models.NonhydrostaticModels ~/atdepth/Oceananigans.jl/src/Models/NonhydrostaticModels/update_nonhydrostatic_model_state.jl:53
 [31] update_state! (repeats 2 times)
    @ ~/atdepth/Oceananigans.jl/src/Models/NonhydrostaticModels/update_nonhydrostatic_model_state.jl:20 [inlined]
 [32] initialize!(sim::Simulation{NonhydrostaticModel{…}, Float32, Float32, OrderedCollections.OrderedDict{…}, OrderedCollections.OrderedDict{…}, OrderedCollections.OrderedDict{…}})
    @ Oceananigans.Simulations ~/atdepth/Oceananigans.jl/src/Simulations/run.jl:208
 [33] time_step!(sim::Simulation{NonhydrostaticModel{…}, Float32, Float32, OrderedCollections.OrderedDict{…}, OrderedCollections.OrderedDict{…}, OrderedCollections.OrderedDict{…}})
    @ Oceananigans.Simulations ~/atdepth/Oceananigans.jl/src/Simulations/run.jl:138
 [34] run!(sim::Simulation{NonhydrostaticModel{…}, Float32, Float32, OrderedCollections.OrderedDict{…}, OrderedCollections.OrderedDict{…}, OrderedCollections.OrderedDict{…}}; pickup::Bool)
    @ Oceananigans.Simulations ~/atdepth/Oceananigans.jl/src/Simulations/run.jl:105
 [35] run!(sim::Simulation{NonhydrostaticModel{…}, Float32, Float32, OrderedCollections.OrderedDict{…}, OrderedCollections.OrderedDict{…}, OrderedCollections.OrderedDict{…}})
    @ Oceananigans.Simulations ~/atdepth/Oceananigans.jl/src/Simulations/run.jl:92
 [36] top-level scope
    @ REPL[15]:1
Some type information was truncated. Use `show(err)` to see complete types.

Environment: Oceananigans main branch (v0.95.23, commit 40e0a8733) with

julia> versioninfo()
Julia Version 1.10.8
Commit 4c16ff44be8 (2025-01-22 10:06 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 48 × AMD Ryzen Threadripper 7960X 24-Cores
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 16 default, 0 interactive, 8 GC (on 48 virtual cores)
Environment:
  LD_PRELOAD = /usr/NX/lib/libnxegl.so

julia> CUDA.versioninfo()
CUDA runtime 12.8, artifact installation
CUDA driver 12.8
NVIDIA driver 570.86.16

CUDA libraries: 
- CUBLAS: 12.8.3
- CURAND: 10.3.9
- CUFFT: 11.3.3
- CUSOLVER: 11.7.2
- CUSPARSE: 12.5.7
- CUPTI: 2025.1.0 (API 26.0.0)
- NVML: 12.0.0+570.86.16

Julia packages: 
- CUDA: 5.6.1
- CUDA_Driver_jll: 0.12.0+0
- CUDA_Runtime_jll: 0.16.0+0

Toolchain:
- Julia: 1.10.8
- LLVM: 15.0.7

1 device:
  0: NVIDIA GeForce RTX 4090 (sm_89, 20.040 GiB / 23.988 GiB available)

The text was updated successfully, but these errors were encountered:

ali-ramadhan · 2025-03-10T20:47:19Z

I know it's not easy to debug this stuff and it's probably an upstream issue. So I'm just opening this issue to document the bug.

It's also a pretty specific configuration so this is a low impact issue/bug.

glwagner · 2025-03-10T20:50:22Z

That's pretty interesting though.

Is it fixed by using

params = (
    rate = 1,
    u★ = 0
)

?

ali-ramadhan · 2025-03-10T20:56:25Z

Good catch! Should always be careful about types. Still getting the same error with Ints and also with

params = (
    rate = 1.0f0,
    u★ = 0.0f0
)

simone-silvestri · 2025-03-10T21:04:47Z

Is this a forcing-related issue or advection? What happens if you remove the forcing?

glwagner · 2025-03-10T21:05:29Z

Ah but actually you aren't adding forcing to the model anyways

ali-ramadhan · 2025-03-10T21:21:31Z

Ah sorry for the typo. The forcing should be in there. I edited the MWE to include it. I was about to test with/without forcing.

Turns out actually you don't need the forcing! This MWE without the forcing produces the same error:

using Oceananigans

underlying_grid = RectilinearGrid(GPU(), Float32;
    topology = (Bounded, Bounded, Bounded),
    size = (10, 10, 10),
    x = (0, 1),
    y = (0, 1),
    z = (-1, 0)
)

height = 1/5
width = 1/5
mount(x, y) = height * exp(-x^2 / 2width^2) * exp(-y^2 / 2width^2)
bottom(x, y) = -1 + mount(x, y)

grid = ImmersedBoundaryGrid(underlying_grid, GridFittedBottom(bottom))

model = NonhydrostaticModel(; grid)

simulation = Simulation(model, Δt=0.01, stop_iteration=1)

run!(simulation)

glwagner · 2025-03-10T21:34:59Z

You can also use bottom(x, y) = -0.5. Quite weird!

glwagner · 2025-03-10T21:44:52Z

I'm finding there is a type instability in div_vu; sometimes it is Float64 , other times Float32. It's unclear exactly whether this is producing the error but it seems suspicious.

simone-silvestri · 2025-03-10T21:47:06Z

Maybe using Metal and a hydrostatic free surface model might shed light on the instability
I guess probably we have to change the Oceananigans.defaults.FloatType to Float32

glwagner · 2025-03-10T21:47:59Z

I tried that, but there is still promotion

glwagner · 2025-03-10T21:52:59Z

may have found the bug

glwagner · 2025-03-10T21:54:28Z

#4193

glwagner · 2025-03-10T21:59:03Z

Yeah so #4193 closes this, provided that we add

Oceananigans.defaults.FloatType = Float32

Another way is to specify Float32 in the advection scheme (should be -- I didn't test explicitly).

In the context of #4193, the error can be reproduced by setting the default to Float64 (or leaving it alone) and manually setting the grid to Float32.

So basically, sometimes promotions works (which is actually bad but does not error) and other times it throws an error (actually what we want, but it is surprising).

glwagner · 2025-03-10T21:59:20Z

I think we should document how to change number type somewhere

ali-ramadhan · 2025-03-10T22:26:25Z

Quick catch! Weird how sometimes it promotes and sometimes it errors haha.

But definitely good to document how to properly change number type (probably mostly to Float32 for GPUs at least).

glwagner · 2025-03-10T22:46:54Z

Right I think we are hitting a compiler heuristic. When promotion is not completely inlined, we get an error.

glwagner · 2025-03-10T22:47:08Z

Quick catch! Weird how sometimes it promotes and sometimes it errors haha.

But definitely good to document how to properly change number type (probably mostly to Float32 for GPUs at least).

Where should we put this in the docs?

ali-ramadhan · 2025-03-10T23:09:20Z

Honestly I would advocate for a top-level page alongside the grids and fields pages. Could be called "Number type" or "Float precision"?

Right now I think we just have this in the legacy docs which definitely need embellishing: https://clima.github.io/OceananigansDocumentation/stable/model_setup/number_type/

glwagner · 2025-03-10T23:35:56Z

Honestly I would advocate for a top-level page alongside the grids and fields pages. Could be called "Number type" or "Float precision"?

Right now I think we just have this in the legacy docs which definitely need embellishing: https://clima.github.io/OceananigansDocumentation/stable/model_setup/number_type/

I was hoping we would eventually get around to adding tutorials for models. It might belong in such a page, or perhaps after it. Because then we can illustrate how it will affect all types simultaneously, not just one.

that said building a tutorial for the models is a little daunting, whereas a simple page to comment on number type is easy, so maybe we should just throw it up and worry about a model tutorial in the longer run

glwagner · 2025-03-10T23:39:14Z

Although actually a tutorial for models would be lifting a lot of that existing material (eg just reorganizing it)

ali-ramadhan · 2025-03-14T19:44:04Z

@glwagner Does #4193 fix the MWE for you? I'm still getting the same error with Oceananigans v0.95.27.

This more minimal MWE (as suggested above) still produces the GPU compilation error:

using Oceananigans

underlying_grid = RectilinearGrid(GPU(), Float32;
    topology = (Bounded, Bounded, Bounded),
    size = (10, 10, 10),
    x = (0, 1),
    y = (0, 1),
    z = (-1, 0)
)

bottom(x, y) = -0.5

grid = ImmersedBoundaryGrid(underlying_grid, GridFittedBottom(bottom))

model = NonhydrostaticModel(; grid)

simulation = Simulation(model, Δt=0.01, stop_iteration=1)

run!(simulation)

glwagner · 2025-03-14T20:54:50Z

Ah sorry! I could have been clearer. I think the correct answer is that what you have written is not supported. However, you can try this:

using Oceananigans

Oceananigans.defaults.FloatType = Float32

underlying_grid = RectilinearGrid(GPU();
    topology = (Bounded, Bounded, Bounded),
    size = (10, 10, 10),
    x = (0, 1),
    y = (0, 1),
    z = (-1, 0)
)

bottom(x, y) = -0.5

grid = ImmersedBoundaryGrid(underlying_grid, GridFittedBottom(bottom))

model = NonhydrostaticModel(; grid)

simulation = Simulation(model, Δt=0.01, stop_iteration=1)

run!(simulation)

One can still override default FloatType for specific purposes / research, but this may cause compilation to fail.

Another way to make this code to pass that avoids changing the default FloatType is to also specify the advection scheme:

using Oceananigans

underlying_grid = RectilinearGrid(GPU(), Float32;
    topology = (Bounded, Bounded, Bounded),
    size = (10, 10, 10),
    x = (0, 1),
    y = (0, 1),
    z = (-1, 0)
)

bottom(x, y) = -0.5
grid = ImmersedBoundaryGrid(underlying_grid, GridFittedBottom(bottom))
advection = Centered(Float32, order=2)
model = NonhydrostaticModel(; grid, advection)

simulation = Simulation(model, Δt=0.01, stop_iteration=1)

run!(simulation)

I didn't test that so let me know if it works.

ali-ramadhan · 2025-03-14T22:58:11Z

Ah thanks for clarifying! I thought changing the default to Centered(FT::DataType=Oceananigans.defaults.FloatType, ...) would fix the issue but the default was still Float64 since I didn't change it which makes sense.

I can confirm that both your examples work so I'll re-close the issue!

ali-ramadhan added bug 🐞 Even a perfect program still has bugs GPU 👾 Where Oceananigans gets its powers from labels Mar 10, 2025

glwagner mentioned this issue Mar 10, 2025

Use default FloatType in Centered advection #4193

Merged

glwagner closed this as completed in #4193 Mar 11, 2025

ali-ramadhan reopened this Mar 14, 2025

ali-ramadhan closed this as completed Mar 14, 2025

GPU compiler error in tendency computation using DiscreteForcing with GPU + Float32 + immersed RectilinearGrid #4192

GPU compiler error in tendency computation using DiscreteForcing with GPU + Float32 + immersed RectilinearGrid #4192

Comments

ali-ramadhan commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ali-ramadhan commented Mar 10, 2025

Uh oh!

glwagner commented Mar 10, 2025

Uh oh!

ali-ramadhan commented Mar 10, 2025

Uh oh!

simone-silvestri commented Mar 10, 2025

Uh oh!

glwagner commented Mar 10, 2025

Uh oh!

ali-ramadhan commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glwagner commented Mar 10, 2025

Uh oh!

glwagner commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simone-silvestri commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glwagner commented Mar 10, 2025

Uh oh!

glwagner commented Mar 10, 2025

Uh oh!

glwagner commented Mar 10, 2025

Uh oh!

glwagner commented Mar 10, 2025

Uh oh!

glwagner commented Mar 10, 2025

Uh oh!

ali-ramadhan commented Mar 10, 2025

Uh oh!

glwagner commented Mar 10, 2025

Uh oh!

glwagner commented Mar 10, 2025

Uh oh!

ali-ramadhan commented Mar 10, 2025

Uh oh!

glwagner commented Mar 10, 2025

Uh oh!

glwagner commented Mar 10, 2025

Uh oh!

ali-ramadhan commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glwagner commented Mar 14, 2025

Uh oh!

ali-ramadhan commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GPU compiler error in tendency computation using `DiscreteForcing` with GPU + `Float32` + immersed `RectilinearGrid` #4192

GPU compiler error in tendency computation using `DiscreteForcing` with GPU + `Float32` + immersed `RectilinearGrid` #4192

ali-ramadhan commented Mar 10, 2025 •

edited

Loading

ali-ramadhan commented Mar 10, 2025 •

edited

Loading

glwagner commented Mar 10, 2025 •

edited

Loading

simone-silvestri commented Mar 10, 2025 •

edited

Loading

ali-ramadhan commented Mar 14, 2025 •

edited

Loading

ali-ramadhan commented Mar 14, 2025 •

edited

Loading