-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[AMDGPU] Define constrained multi-dword scalar load instructions. #96161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
@llvm/pr-subscribers-backend-amdgpu Author: Christudasan Devadasan (cdevadas) ChangesFull diff: https://github.com/llvm/llvm-project/pull/96161.diff 1 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SMInstructions.td b/llvm/lib/Target/AMDGPU/SMInstructions.td
index df1722b1f7fb4..4551a3a615b15 100644
--- a/llvm/lib/Target/AMDGPU/SMInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SMInstructions.td
@@ -167,6 +167,20 @@ multiclass SM_Pseudo_Loads<RegisterClass baseClass,
def _IMM : SM_Load_Pseudo <opName, baseClass, dstClass, IMM_Offset>;
def _SGPR : SM_Load_Pseudo <opName, baseClass, dstClass, SGPR_Offset>;
def _SGPR_IMM : SM_Load_Pseudo <opName, baseClass, dstClass, SGPR_IMM_Offset>;
+
+ // The constrained multi-dword load equivalents with early clobber flag at
+ // the dst operand. They are needed only for codegen and there is no need for
+ // their real opcodes.
+ let SubtargetPredicate = isGFX8Plus,
+ Constraints = !if(!gt(dstClass.RegTypes[0].Size, 32),
+ "@earlyclobber $sdst", "") in {
+ let PseudoInstr = NAME # !cast<OffsetMode>(IMM_Offset).Variant in
+ def _IMM_ec : SM_Load_Pseudo <opName, baseClass, dstClass, IMM_Offset>;
+ let PseudoInstr = NAME # !cast<OffsetMode>(SGPR_Offset).Variant in
+ def _SGPR_ec : SM_Load_Pseudo <opName, baseClass, dstClass, SGPR_Offset>;
+ let PseudoInstr = NAME # !cast<OffsetMode>(SGPR_IMM_Offset).Variant in
+ def _SGPR_IMM_ec : SM_Load_Pseudo <opName, baseClass, dstClass, SGPR_IMM_Offset>;
+ }
}
multiclass SM_Pseudo_Stores<RegisterClass baseClass,
|
I also must say that s_buffer_load is in the same bucket. |
I just wish to have some optimization here: most of these loads are from kernarg. We know that kernarg is page aligned (I guess?). We also know a minimal page size and kernarg size. So if kernarg size is no greater than page size, skip it. Or if kernarg is not page aligned, make it page aligned. |
I just wish to have some optimization here: most of these loads are from kernarg. We know that kernarg is page aligned (I guess?). We also know a minimal page size and kernarg size. So if kernarg size is no greater than page size, skip it. Or if kernarg is not page aligned, make it page aligned.
JBTW, with that you will probably end up with exactly zero kernels falling into this category. |
Buffer loads are generated for the buffer intrinsic and we take care of them during legalization to always use natural alignment. But we might require the *_ec buffer opcodes for SILoadStoreOptimizer while merging them. @rampitec do you think that can happen? |
It can happen I believe. |
The buffer instructions aren't constrained yet. Can I do that in a separate patch? |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo @arsenm's suggestions
d7c254b
to
f7c8bca
Compare
f7c8bca
to
2850b4f
Compare
…6161) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251227
No description provided.