Incorrect max_storage_buffer_binding_size with silent failure #2361

fh-igd-mueller-roemer · 2022-01-06T16:09:57Z

Description
On a device with a reported max_storage_buffer_binding_size of 4294967295 (2³² - 1), I should be able to allocate, bind, and access a storage buffer containing 1073741823 (⌊(2³² - 1) / 4⌋) f32 values in a compute shader, without silent failures.

Repro steps
See attached trace files below.

Expected vs observed behavior

Expected:

Allocation, binding, and access either succeed for buffers smaller than device.limits().max_storage_buffer_binding_size or fail with an error if an out-of-memory condition occurs.

Observed:

Allocation and binding are successful, however access fails silently for any storage buffer containing more than 536870911 (⌊(2³¹ - 1) / 4⌋) f32 values. On the Vulkan backend, no entries are accessed successfully anymore for larger arrays (reduction kernel produces a value of -0.0). On the DX12 backend, entries beyond index 536870911 are accessed as zero (the reduction produces a value of 536870900 due to floating point rounding). No suspicious log outputs (except the ReportLiveObjects() warnings on termination that I always get in debug mode) are produced. Same if Vulkan validation layers are enforced via the Vulkan configurator.

Due to the point at which failure occurs, it seems an i32 is involved at some point where it shouldn't. It is unclear if the issue is driver- or wgpu-related. Alternatively, it may be that the max_storage_buffer_binding_size is being misreported.

Extra materials
trace_dx12.zip.zip
trace_vulkan_2.zip.zip
(Double zipped to reduce size, slightly less than 4 GiB each when uncompressed)

Platform
Windows 10 21H2
rustc 1.57.0
NVIDIA GeForce 3070 Laptop
NVIDIA GeForce 510.06 driver (WSLg preview release)
wgpu 9aac778 (gecko-branch)

The text was updated successfully, but these errors were encountered:

kvark · 2022-01-06T16:44:11Z

Thank you for filing this beautifully detailed issue!
I checked the logic in wgpu-core and hal/vulkan, not seeing anything suspicious.
Would be interesting to see if fixing #2337 makes this any different.
Perhaps, you could also attach vulkaninfo output here?

kvark · 2022-01-06T16:48:42Z

Oh wait, I know what's going on here! It's not the API problem, it's the shader problem, but which needs to be helped from the API side.
Basically, in SPIR-V and HLSL, indices are always treated as i32 integers internally by driver/compiler toolchains, even if we use u32 all the way through.
cc @jimblandy

Short term solution: limit the max storage binding sizes to 2^31 in all backends.
Long term solution: make Naga and wgpu cooperate on the buffers that need 64-bit addressing (internally!), generate appropriate code, etc. This is slightly complicated.

fh-igd-mueller-roemer · 2022-01-07T09:00:42Z

Thank you for filing this beautifully detailed issue!

You're welcome, and thank you for the quick fix/workaround!

Perhaps, you could also attach vulkaninfo output here?

I assume this is obsolete now? In any case, the interesting bit is probably

VkPhysicalDeviceLimits:
-----------------------
        [...]
        maxStorageBufferRange                           = 4294967295

So wgpu's limits match Vulkan's limits.

kvark added the type: bug Something isn't working label Jan 6, 2022

kvark added external: driver-bug A driver is causing the bug, though we may still want to work around it and removed external: driver-bug A driver is causing the bug, though we may still want to work around it labels Jan 6, 2022

kvark mentioned this issue Jan 6, 2022

hal: limit binding sizes to i32 #2363

Merged

kvark closed this as completed in #2363 Jan 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect max_storage_buffer_binding_size with silent failure #2361

Incorrect max_storage_buffer_binding_size with silent failure #2361

fh-igd-mueller-roemer commented Jan 6, 2022

kvark commented Jan 6, 2022

kvark commented Jan 6, 2022

fh-igd-mueller-roemer commented Jan 7, 2022

Incorrect max_storage_buffer_binding_size with silent failure #2361

Incorrect max_storage_buffer_binding_size with silent failure #2361

Comments

fh-igd-mueller-roemer commented Jan 6, 2022

kvark commented Jan 6, 2022

kvark commented Jan 6, 2022

fh-igd-mueller-roemer commented Jan 7, 2022