[AMDGPU] Define constrained multi-dword scalar load instructions. #96161

cdevadas · 2024-06-20T10:26:28Z

No description provided.

cdevadas · 2024-06-20T10:26:43Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @cdevadas and the rest of your teammates on Graphite

llvmbot · 2024-06-20T10:37:46Z

@llvm/pr-subscribers-backend-amdgpu

Author: Christudasan Devadasan (cdevadas)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/96161.diff

1 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SMInstructions.td (+14)

diff --git a/llvm/lib/Target/AMDGPU/SMInstructions.td b/llvm/lib/Target/AMDGPU/SMInstructions.td
index df1722b1f7fb4..4551a3a615b15 100644
--- a/llvm/lib/Target/AMDGPU/SMInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SMInstructions.td
@@ -167,6 +167,20 @@ multiclass SM_Pseudo_Loads<RegisterClass baseClass,
   def _IMM : SM_Load_Pseudo <opName, baseClass, dstClass, IMM_Offset>;
   def _SGPR : SM_Load_Pseudo <opName, baseClass, dstClass, SGPR_Offset>;
   def _SGPR_IMM : SM_Load_Pseudo <opName, baseClass, dstClass, SGPR_IMM_Offset>;
+
+  // The constrained multi-dword load equivalents with early clobber flag at
+  // the dst operand. They are needed only for codegen and there is no need for
+  // their real opcodes.
+  let SubtargetPredicate = isGFX8Plus,
+      Constraints = !if(!gt(dstClass.RegTypes[0].Size, 32),
+                         "@earlyclobber $sdst", "") in {
+    let PseudoInstr = NAME # !cast<OffsetMode>(IMM_Offset).Variant in
+      def _IMM_ec : SM_Load_Pseudo <opName, baseClass, dstClass, IMM_Offset>;
+    let PseudoInstr = NAME # !cast<OffsetMode>(SGPR_Offset).Variant in
+      def _SGPR_ec : SM_Load_Pseudo <opName, baseClass, dstClass, SGPR_Offset>;
+    let PseudoInstr = NAME # !cast<OffsetMode>(SGPR_IMM_Offset).Variant in
+      def _SGPR_IMM_ec : SM_Load_Pseudo <opName, baseClass, dstClass, SGPR_IMM_Offset>;
+  }
 }
 
 multiclass SM_Pseudo_Stores<RegisterClass baseClass,

llvm/lib/Target/AMDGPU/SMInstructions.td

rampitec · 2024-06-21T00:27:35Z

I also must say that s_buffer_load is in the same bucket.

rampitec · 2024-06-21T00:32:07Z

I just wish to have some optimization here: most of these loads are from kernarg. We know that kernarg is page aligned (I guess?). We also know a minimal page size and kernarg size. So if kernarg size is no greater than page size, skip it. Or if kernarg is not page aligned, make it page aligned.

rampitec · 2024-06-21T00:39:33Z

I just wish to have some optimization here: most of these loads are from kernarg. We know that kernarg is page aligned (I guess?). We also know a minimal page size and kernarg size. So if kernarg size is no greater than page size, skip it. Or if kernarg is not page aligned, make it page aligned.

I just wish to have some optimization here: most of these loads are from kernarg. We know that kernarg is page aligned (I guess?). We also know a minimal page size and kernarg size. So if kernarg size is no greater than page size, skip it. Or if kernarg is not page aligned, make it page aligned.

JBTW, with that you will probably end up with exactly zero kernels falling into this category.

llvm/lib/Target/AMDGPU/SMInstructions.td

cdevadas · 2024-06-25T09:28:54Z

I also must say that s_buffer_load is in the same bucket.

Buffer loads are generated for the buffer intrinsic and we take care of them during legalization to always use natural alignment. But we might require the *_ec buffer opcodes for SILoadStoreOptimizer while merging them. @rampitec do you think that can happen?

rampitec · 2024-06-25T15:42:58Z

I also must say that s_buffer_load is in the same bucket.

Buffer loads are generated for the buffer intrinsic and we take care of them during legalization to always use natural alignment. But we might require the *_ec buffer opcodes for SILoadStoreOptimizer while merging them. @rampitec do you think that can happen?

It can happen I believe.

cdevadas · 2024-07-01T06:41:10Z

I also must say that s_buffer_load is in the same bucket.

Buffer loads are generated for the buffer intrinsic and we take care of them during legalization to always use natural alignment. But we might require the *_ec buffer opcodes for SILoadStoreOptimizer while merging them. @rampitec do you think that can happen?

It can happen I believe.

The buffer instructions aren't constrained yet. Can I do that in a separate patch?

rampitec · 2024-07-01T07:11:55Z

The buffer instructions aren't constrained yet. Can I do that in a separate patch?
Yes, this is a note, not required for this patch.

jayfoad

LGTM modulo @arsenm's suggestions

cdevadas · 2024-07-23T08:02:59Z

Merge activity

Jul 23, 4:02 AM EDT: @cdevadas started a stack merge that includes this pull request via Graphite.
Jul 23, 4:04 AM EDT: Graphite rebased this pull request as part of a merge.
Jul 23, 4:06 AM EDT: @cdevadas merged this pull request with Graphite.

…6161) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251227

This was referenced Jun 20, 2024

[AMDGPU][SILoadStoreOptimizer] Merge constrained sloads #96162

Merged

[AMDGPU] Codegen support for constrained multi-dword sloads #96163

Merged

cdevadas requested review from jayfoad, arsenm and rampitec June 20, 2024 10:33

cdevadas marked this pull request as ready for review June 20, 2024 10:37

llvmbot added the backend:AMDGPU label Jun 20, 2024

jayfoad reviewed Jun 20, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/SMInstructions.td Outdated Show resolved Hide resolved

arsenm reviewed Jun 21, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/SMInstructions.td Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/SMInstructions.td Outdated Show resolved Hide resolved

jayfoad reviewed Jul 1, 2024

View reviewed changes

arsenm approved these changes Jul 3, 2024

View reviewed changes

cdevadas force-pushed the users/cdevadas/constrained-sload-insns branch 2 times, most recently from d7c254b to f7c8bca Compare July 23, 2024 06:40

cdevadas added 3 commits July 23, 2024 08:03

[AMDGPU] Define constrained multi-dword scalar load instructions.

a2deed4

skip _ec ld insn when data size is lesser or equal to 32.

f64ad28

handle the pseudo instruction defs inside a multiclass.

2850b4f

cdevadas force-pushed the users/cdevadas/constrained-sload-insns branch from f7c8bca to 2850b4f Compare July 23, 2024 08:04

cdevadas merged commit eeb7feb into main Jul 23, 2024
4 of 7 checks passed

cdevadas deleted the users/cdevadas/constrained-sload-insns branch July 23, 2024 08:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] Define constrained multi-dword scalar load instructions. #96161

[AMDGPU] Define constrained multi-dword scalar load instructions. #96161

cdevadas commented Jun 20, 2024

cdevadas commented Jun 20, 2024 •

edited

Loading

llvmbot commented Jun 20, 2024

rampitec commented Jun 21, 2024

rampitec commented Jun 21, 2024

rampitec commented Jun 21, 2024

cdevadas commented Jun 25, 2024

rampitec commented Jun 25, 2024

cdevadas commented Jul 1, 2024

rampitec commented Jul 1, 2024

jayfoad left a comment

cdevadas commented Jul 23, 2024 •

edited

Loading

[AMDGPU] Define constrained multi-dword scalar load instructions. #96161

[AMDGPU] Define constrained multi-dword scalar load instructions. #96161

Conversation

cdevadas commented Jun 20, 2024

cdevadas commented Jun 20, 2024 • edited Loading

llvmbot commented Jun 20, 2024

rampitec commented Jun 21, 2024

rampitec commented Jun 21, 2024

rampitec commented Jun 21, 2024

cdevadas commented Jun 25, 2024

rampitec commented Jun 25, 2024

cdevadas commented Jul 1, 2024

rampitec commented Jul 1, 2024

jayfoad left a comment

Choose a reason for hiding this comment

cdevadas commented Jul 23, 2024 • edited Loading

Merge activity

cdevadas commented Jun 20, 2024 •

edited

Loading

cdevadas commented Jul 23, 2024 •

edited

Loading