Skip to content

[spirv-opt] Missed optimization opportunity in SROA #5998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Jasper-Bekkers opened this issue Feb 14, 2025 · 6 comments
Open

[spirv-opt] Missed optimization opportunity in SROA #5998

Jasper-Bekkers opened this issue Feb 14, 2025 · 6 comments

Comments

@Jasper-Bekkers
Copy link

Jasper-Bekkers commented Feb 14, 2025

With struct OffsetTable { float table[32] } we recently had to do the following optimization to reduce VGPR usage drastically in one of our kernels. The optimization distinctly feels like something scalar replacement should've taken care of (instead of loading the 32 floats into registers & then selecting a register).

From looking at it, the code in scalar_replace.cpp seems to be able to operate on array's as well, but it misses this case.

In this particular case this resulted in a save of ~32 VGPRs because we could elide the table in its entirety.

Image

Code uses HLSL & our own internal abstractions over buffer loads and stores to accommodate bindless

loadUniform maps to:

ByteAddressBuffer buffer = ResourceDescriptorHeap[internalHandle];
return buffer.Load<T>(index * sizeof(T));
@Jasper-Bekkers Jasper-Bekkers changed the title [spirv-opt] Missed optimization in SROA [spirv-opt] Missed optimization opportunity in SROA Feb 14, 2025
@gan74
Copy link

gan74 commented Feb 25, 2025

We recently had the exact same problem (although much worst, we ended up with a 2KB spill). Here is our minimal repro shader

Neither NVidia nor AMD drivers seem optimize this, this needs to be fixed on the SpirV side.

In a more general case: when a struct is loaded from a buffer (such as in OffsetTable offsetTable = bnd.loadUniform<OffsetTable>(); above) all potentially used members are loaded immediately.
Members that are only conditionally used should only be loaded on demand, to save registers.

For example this:

Material material = load_material_from_buffer();
if(something) {
    do_something(object.foo);
} else {
    do_something(object.bar);
}

Will always load both foo and bar, wasting registers.

This also blows up the SpirV code size when dealing with arrays.

@s-perron
Copy link
Collaborator

Is this fixed by #6016?

@Jasper-Bekkers
Copy link
Author

Is this fixed by #6016?

Since we mostly deal with dxc, would you know how I can point it to a new version of spirv-opt?

@s-perron
Copy link
Collaborator

s-perron commented Mar 4, 2025

You would have to build DXC yourself with the updated spirv-tools for the submodule. We can wait until it lands to find out. I suspect it does not. Your using HLSL, but that fix is focused on a GLSL code pattern with newer versions of SPIR-V that use OpCopyLogical. I don't think DXC uses that instruction.

@s-perron
Copy link
Collaborator

s-perron commented Mar 4, 2025

More comments on the original test case.

OffsetTable offsetTable bnd.offsetTable.loadUniform();

DXC will translate this in a single load instruction that will load the entire array.

uint numEntries offsetTable.table [layer + 1] ...

A variable index into the array make SROA impossible. We do not know which elements are needed.

The correct pass is to fix copy propagate arrays as was done in #6016; however, without access to the code I cannot know for sure why that pass is failing in this case. That pass would essentially create the improved code on the right of the diff. I'm guessing it is because the original OpLoad of the array does not come from a variable.

To fix this, we would have to be able to replace OpAccessChain instructions on offsetTable.table with an addition of to the original int value that was cast to a pointer for the raw buffer load. Then do another cast. Off the top of my head, I can remember how hard that would be to implement.

@gan74
Copy link

gan74 commented Mar 5, 2025

Is this fixed by #6016?

The linked PR doesn't fix the repro in my example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants