You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update thread load inline asm to be compatible with llvm
vgpr16 update (#205)
See
llvm/llvm-project@7f62800.
rocPRIM will fail to compile with the above LLVM change without this PR.
Deeper technical explanation:
The new code at
[SIISelLowering.cpp#L16205](https://github.com/llvm/llvm-project/blob/7ffdf4240d62724dca7f42b37bd8671fefe17e17/llvm/lib/Target/AMDGPU/SIISelLowering.cpp#L16205)
is correct, because this is how we would define 16 bit registers in
inline asm for those instructions that actually use 16 bit registers.
Flat_load_ushort/flat_load_ubyte do not actually use 16-bit registers in
assembly, they use 32-bit. These instruction explicitly zero-extend to
32-bits.
It would be a different case, but please note that instructions like
flat_load_d16_b16 do not zero extend, but they still use 32-bit
registers in assembly.
The simplest fix is to change interim_type to int32_t/uint32_t for all
loads.
0 commit comments