Skip to content

[AMDGPU] Define constrained multi-dword scalar load instructions. #96161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 23, 2024

Conversation

cdevadas
Copy link
Collaborator

No description provided.

@cdevadas
Copy link
Collaborator Author

cdevadas commented Jun 20, 2024

@llvmbot
Copy link
Member

llvmbot commented Jun 20, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Christudasan Devadasan (cdevadas)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/96161.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SMInstructions.td (+14)
diff --git a/llvm/lib/Target/AMDGPU/SMInstructions.td b/llvm/lib/Target/AMDGPU/SMInstructions.td
index df1722b1f7fb4..4551a3a615b15 100644
--- a/llvm/lib/Target/AMDGPU/SMInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SMInstructions.td
@@ -167,6 +167,20 @@ multiclass SM_Pseudo_Loads<RegisterClass baseClass,
   def _IMM : SM_Load_Pseudo <opName, baseClass, dstClass, IMM_Offset>;
   def _SGPR : SM_Load_Pseudo <opName, baseClass, dstClass, SGPR_Offset>;
   def _SGPR_IMM : SM_Load_Pseudo <opName, baseClass, dstClass, SGPR_IMM_Offset>;
+
+  // The constrained multi-dword load equivalents with early clobber flag at
+  // the dst operand. They are needed only for codegen and there is no need for
+  // their real opcodes.
+  let SubtargetPredicate = isGFX8Plus,
+      Constraints = !if(!gt(dstClass.RegTypes[0].Size, 32),
+                         "@earlyclobber $sdst", "") in {
+    let PseudoInstr = NAME # !cast<OffsetMode>(IMM_Offset).Variant in
+      def _IMM_ec : SM_Load_Pseudo <opName, baseClass, dstClass, IMM_Offset>;
+    let PseudoInstr = NAME # !cast<OffsetMode>(SGPR_Offset).Variant in
+      def _SGPR_ec : SM_Load_Pseudo <opName, baseClass, dstClass, SGPR_Offset>;
+    let PseudoInstr = NAME # !cast<OffsetMode>(SGPR_IMM_Offset).Variant in
+      def _SGPR_IMM_ec : SM_Load_Pseudo <opName, baseClass, dstClass, SGPR_IMM_Offset>;
+  }
 }
 
 multiclass SM_Pseudo_Stores<RegisterClass baseClass,

@rampitec
Copy link
Collaborator

I also must say that s_buffer_load is in the same bucket.

@rampitec
Copy link
Collaborator

I just wish to have some optimization here: most of these loads are from kernarg. We know that kernarg is page aligned (I guess?). We also know a minimal page size and kernarg size. So if kernarg size is no greater than page size, skip it. Or if kernarg is not page aligned, make it page aligned.

@rampitec
Copy link
Collaborator

I just wish to have some optimization here: most of these loads are from kernarg. We know that kernarg is page aligned (I guess?). We also know a minimal page size and kernarg size. So if kernarg size is no greater than page size, skip it. Or if kernarg is not page aligned, make it page aligned.

I just wish to have some optimization here: most of these loads are from kernarg. We know that kernarg is page aligned (I guess?). We also know a minimal page size and kernarg size. So if kernarg size is no greater than page size, skip it. Or if kernarg is not page aligned, make it page aligned.

JBTW, with that you will probably end up with exactly zero kernels falling into this category.

@cdevadas
Copy link
Collaborator Author

I also must say that s_buffer_load is in the same bucket.

Buffer loads are generated for the buffer intrinsic and we take care of them during legalization to always use natural alignment. But we might require the *_ec buffer opcodes for SILoadStoreOptimizer while merging them. @rampitec do you think that can happen?

@rampitec
Copy link
Collaborator

I also must say that s_buffer_load is in the same bucket.

Buffer loads are generated for the buffer intrinsic and we take care of them during legalization to always use natural alignment. But we might require the *_ec buffer opcodes for SILoadStoreOptimizer while merging them. @rampitec do you think that can happen?

It can happen I believe.

@cdevadas
Copy link
Collaborator Author

cdevadas commented Jul 1, 2024

I also must say that s_buffer_load is in the same bucket.

Buffer loads are generated for the buffer intrinsic and we take care of them during legalization to always use natural alignment. But we might require the *_ec buffer opcodes for SILoadStoreOptimizer while merging them. @rampitec do you think that can happen?

It can happen I believe.

The buffer instructions aren't constrained yet. Can I do that in a separate patch?

@rampitec
Copy link
Collaborator

rampitec commented Jul 1, 2024

The buffer instructions aren't constrained yet. Can I do that in a separate patch?
Yes, this is a note, not required for this patch.

Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo @arsenm's suggestions

@cdevadas cdevadas force-pushed the users/cdevadas/constrained-sload-insns branch 2 times, most recently from d7c254b to f7c8bca Compare July 23, 2024 06:40
Copy link
Collaborator Author

cdevadas commented Jul 23, 2024

Merge activity

  • Jul 23, 4:02 AM EDT: @cdevadas started a stack merge that includes this pull request via Graphite.
  • Jul 23, 4:04 AM EDT: Graphite rebased this pull request as part of a merge.
  • Jul 23, 4:06 AM EDT: @cdevadas merged this pull request with Graphite.

@cdevadas cdevadas force-pushed the users/cdevadas/constrained-sload-insns branch from f7c8bca to 2850b4f Compare July 23, 2024 08:04
@cdevadas cdevadas merged commit eeb7feb into main Jul 23, 2024
4 of 7 checks passed
@cdevadas cdevadas deleted the users/cdevadas/constrained-sload-insns branch July 23, 2024 08:06
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants