-
Notifications
You must be signed in to change notification settings - Fork 597
[spirv-opt] Missed optimization opportunity in SROA #5998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We recently had the exact same problem (although much worst, we ended up with a 2KB spill). Here is our minimal repro shader Neither NVidia nor AMD drivers seem optimize this, this needs to be fixed on the SpirV side. In a more general case: when a struct is loaded from a buffer (such as in For example this:
Will always load both foo and bar, wasting registers. This also blows up the SpirV code size when dealing with arrays. |
Is this fixed by #6016? |
Since we mostly deal with dxc, would you know how I can point it to a new version of spirv-opt? |
You would have to build DXC yourself with the updated spirv-tools for the submodule. We can wait until it lands to find out. I suspect it does not. Your using HLSL, but that fix is focused on a GLSL code pattern with newer versions of SPIR-V that use OpCopyLogical. I don't think DXC uses that instruction. |
More comments on the original test case.
DXC will translate this in a single load instruction that will load the entire array.
A variable index into the array make SROA impossible. We do not know which elements are needed. The correct pass is to fix copy propagate arrays as was done in #6016; however, without access to the code I cannot know for sure why that pass is failing in this case. That pass would essentially create the improved code on the right of the diff. I'm guessing it is because the original OpLoad of the array does not come from a variable. To fix this, we would have to be able to replace OpAccessChain instructions on |
The linked PR doesn't fix the repro in my example. |
Uh oh!
There was an error while loading. Please reload this page.
With struct
OffsetTable { float table[32] }
we recently had to do the following optimization to reduce VGPR usage drastically in one of our kernels. The optimization distinctly feels like something scalar replacement should've taken care of (instead of loading the 32 floats into registers & then selecting a register).From looking at it, the code in
scalar_replace.cpp
seems to be able to operate on array's as well, but it misses this case.In this particular case this resulted in a save of ~32 VGPRs because we could elide the table in its entirety.
Code uses HLSL & our own internal abstractions over buffer loads and stores to accommodate bindless
loadUniform
maps to:The text was updated successfully, but these errors were encountered: