Skip to content

Incorrect max_storage_buffer_binding_size with silent failure #2361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fh-igd-mueller-roemer opened this issue Jan 6, 2022 · 3 comments · Fixed by #2363
Closed

Incorrect max_storage_buffer_binding_size with silent failure #2361

fh-igd-mueller-roemer opened this issue Jan 6, 2022 · 3 comments · Fixed by #2363
Labels
type: bug Something isn't working

Comments

@fh-igd-mueller-roemer
Copy link

Description
On a device with a reported max_storage_buffer_binding_size of 4294967295 (2³² - 1), I should be able to allocate, bind, and access a storage buffer containing 1073741823 (⌊(2³² - 1) / 4⌋) f32 values in a compute shader, without silent failures.

Repro steps
See attached trace files below.

Expected vs observed behavior

Expected:

Allocation, binding, and access either succeed for buffers smaller than device.limits().max_storage_buffer_binding_size or fail with an error if an out-of-memory condition occurs.

Observed:

Allocation and binding are successful, however access fails silently for any storage buffer containing more than 536870911 (⌊(2³¹ - 1) / 4⌋) f32 values. On the Vulkan backend, no entries are accessed successfully anymore for larger arrays (reduction kernel produces a value of -0.0). On the DX12 backend, entries beyond index 536870911 are accessed as zero (the reduction produces a value of 536870900 due to floating point rounding). No suspicious log outputs (except the ReportLiveObjects() warnings on termination that I always get in debug mode) are produced. Same if Vulkan validation layers are enforced via the Vulkan configurator.

Due to the point at which failure occurs, it seems an i32 is involved at some point where it shouldn't. It is unclear if the issue is driver- or wgpu-related. Alternatively, it may be that the max_storage_buffer_binding_size is being misreported.

Extra materials
trace_dx12.zip.zip
trace_vulkan_2.zip.zip
(Double zipped to reduce size, slightly less than 4 GiB each when uncompressed)

Platform
Windows 10 21H2
rustc 1.57.0
NVIDIA GeForce 3070 Laptop
NVIDIA GeForce 510.06 driver (WSLg preview release)
wgpu 9aac778 (gecko-branch)

@kvark kvark added the type: bug Something isn't working label Jan 6, 2022
@kvark
Copy link
Member

kvark commented Jan 6, 2022

Thank you for filing this beautifully detailed issue!
I checked the logic in wgpu-core and hal/vulkan, not seeing anything suspicious.
Would be interesting to see if fixing #2337 makes this any different.
Perhaps, you could also attach vulkaninfo output here?

@kvark kvark added external: driver-bug A driver is causing the bug, though we may still want to work around it and removed external: driver-bug A driver is causing the bug, though we may still want to work around it labels Jan 6, 2022
@kvark
Copy link
Member

kvark commented Jan 6, 2022

Oh wait, I know what's going on here! It's not the API problem, it's the shader problem, but which needs to be helped from the API side.
Basically, in SPIR-V and HLSL, indices are always treated as i32 integers internally by driver/compiler toolchains, even if we use u32 all the way through.
cc @jimblandy

Short term solution: limit the max storage binding sizes to 2^31 in all backends.
Long term solution: make Naga and wgpu cooperate on the buffers that need 64-bit addressing (internally!), generate appropriate code, etc. This is slightly complicated.

@fh-igd-mueller-roemer
Copy link
Author

Thank you for filing this beautifully detailed issue!

You're welcome, and thank you for the quick fix/workaround!

Perhaps, you could also attach vulkaninfo output here?

I assume this is obsolete now? In any case, the interesting bit is probably

VkPhysicalDeviceLimits:
-----------------------
        [...]
        maxStorageBufferRange                           = 4294967295

So wgpu's limits match Vulkan's limits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants