Skip to content

Commit 8724af3

Browse files
[rocm-libraries] ROCm/rocm-libraries#205 (commit 23c5a8a)
Update thread load inline asm to be compatible with llvm vgpr16 update (#205) See llvm/llvm-project@7f62800. rocPRIM will fail to compile with the above LLVM change without this PR. Deeper technical explanation: The new code at [SIISelLowering.cpp#L16205](https://github.com/llvm/llvm-project/blob/7ffdf4240d62724dca7f42b37bd8671fefe17e17/llvm/lib/Target/AMDGPU/SIISelLowering.cpp#L16205) is correct, because this is how we would define 16 bit registers in inline asm for those instructions that actually use 16 bit registers. Flat_load_ushort/flat_load_ubyte do not actually use 16-bit registers in assembly, they use 32-bit. These instruction explicitly zero-extend to 32-bits. It would be a different case, but please note that instructions like flat_load_d16_b16 do not zero extend, but they still use 32-bit registers in assembly. The simplest fix is to change interim_type to int32_t/uint32_t for all loads.
1 parent d6a8d9d commit 8724af3

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

rocprim/include/rocprim/thread/thread_load.hpp

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -102,10 +102,10 @@ T asm_thread_load(void* ptr)
102102
// TODO Add specialization for custom larger data types
103103
// clang-format off
104104
#define ROCPRIM_ASM_THREAD_LOAD_GROUP(cache_modifier, llvm_cache_modifier, wait_inst, wait_cmd) \
105-
ROCPRIM_ASM_THREAD_LOAD(cache_modifier, llvm_cache_modifier, int8_t, int16_t, flat_load_sbyte, v, wait_inst, wait_cmd); \
106-
ROCPRIM_ASM_THREAD_LOAD(cache_modifier, llvm_cache_modifier, int16_t, int16_t, flat_load_sshort, v, wait_inst, wait_cmd); \
107-
ROCPRIM_ASM_THREAD_LOAD(cache_modifier, llvm_cache_modifier, uint8_t, uint16_t, flat_load_ubyte, v, wait_inst, wait_cmd); \
108-
ROCPRIM_ASM_THREAD_LOAD(cache_modifier, llvm_cache_modifier, uint16_t, uint16_t, flat_load_ushort, v, wait_inst, wait_cmd); \
105+
ROCPRIM_ASM_THREAD_LOAD(cache_modifier, llvm_cache_modifier, int8_t, int32_t, flat_load_sbyte, v, wait_inst, wait_cmd); \
106+
ROCPRIM_ASM_THREAD_LOAD(cache_modifier, llvm_cache_modifier, int16_t, int32_t, flat_load_sshort, v, wait_inst, wait_cmd); \
107+
ROCPRIM_ASM_THREAD_LOAD(cache_modifier, llvm_cache_modifier, uint8_t, uint32_t, flat_load_ubyte, v, wait_inst, wait_cmd); \
108+
ROCPRIM_ASM_THREAD_LOAD(cache_modifier, llvm_cache_modifier, uint16_t, uint32_t, flat_load_ushort, v, wait_inst, wait_cmd); \
109109
ROCPRIM_ASM_THREAD_LOAD(cache_modifier, llvm_cache_modifier, uint32_t, uint32_t, flat_load_dword, v, wait_inst, wait_cmd); \
110110
ROCPRIM_ASM_THREAD_LOAD(cache_modifier, llvm_cache_modifier, float, uint32_t, flat_load_dword, v, wait_inst, wait_cmd); \
111111
ROCPRIM_ASM_THREAD_LOAD(cache_modifier, llvm_cache_modifier, uint64_t, uint64_t, flat_load_dwordx2, v, wait_inst, wait_cmd); \

0 commit comments

Comments
 (0)