Skip to content

[AMDGPU] Implement workaround for GFX11.5 export priority #99273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 14 additions & 4 deletions llvm/lib/Target/AMDGPU/AMDGPU.td
Original file line number Diff line number Diff line change
Expand Up @@ -947,6 +947,12 @@ def FeatureHasRestrictedSOffset : SubtargetFeature<"restricted-soffset",
"Has restricted SOffset (immediate not supported)."
>;

def FeatureRequiredExportPriority : SubtargetFeature<"required-export-priority",
"HasRequiredExportPriority",
"true",
"Export priority must be explicitly manipulated on GFX11.5"
>;

//===------------------------------------------------------------===//
// Subtarget Features (options and debugging)
//===------------------------------------------------------------===//
Expand Down Expand Up @@ -1567,7 +1573,8 @@ def FeatureISAVersion11_Generic: FeatureSet<
FeatureUserSGPRInit16Bug,
FeatureMADIntraFwdBug,
FeaturePrivEnabledTrap2NopBug,
FeatureRequiresCOV6])>;
FeatureRequiresCOV6,
FeatureRequiredExportPriority])>;

def FeatureISAVersion11_0_Common : FeatureSet<
!listconcat(FeatureISAVersion11_Common.Features,
Expand Down Expand Up @@ -1597,20 +1604,23 @@ def FeatureISAVersion11_5_0 : FeatureSet<
!listconcat(FeatureISAVersion11_Common.Features,
[FeatureSALUFloatInsts,
FeatureDPPSrc1SGPR,
FeatureVGPRSingleUseHintInsts])>;
FeatureVGPRSingleUseHintInsts,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug workaround also need to be added to the generic target

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a generic target for 11.5, these don't apply to all of 11, only the three 11.5 targets definitions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is ISAVersion11_Generic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gfx11-generic includes gfx115x.

I don't think there is a generic target for 11.5, these don't apply to all of 11, only the three 11.5 targets definitions.

If this is a codegen pessimization to fix a bug, and it's not incompatible with gfx110x (just makes them slower), then it needs to go in the generic target.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

There seems to be some inconsistency with generics in general.
GFX10 has separate 10.1, 10.3 generic targets, I guess because .0 is not defined.
GFX 9 generic target doesn't follow this workaround absorbing convention, and 9.4 doesn't have a generic target.
I guess splitting GFX11 to .0 and .5 now would not make sense if someone was using the gfx11-generic target, as it would still need to stay defined?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GFX10 has separate 10.1, 10.3 generic targets

It's inconsistent because the hardware is inconsistent. Ideally they wouldn't have broken compatibility so badly between minor gfx10 revisions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GFX 9 generic target doesn't follow this workaround absorbing convention,

It certainly is supposed to

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be some inconsistency with generics in general.

We have 2 families for GFX10 because GFX101x and GFX103x have incompatible ISAs due to encoding changes.

GFX10 has separate 10.1, 10.3 generic targets, I guess because .0 is not defined.

The naming convention is just:

  • If we have one family for the whole generation, just use the family name (gfx9-generic)
  • If we have more, then name each target after the earlier/lowest number in the family (gfx101x -> gfx10.1-generic)

GFX 9 generic target doesn't follow this workaround absorbing convention

What do you mean? We should have all workarounds included in gfx9-generic, if not, that's an issue.
Can you please tell me which workaround you think is missing? I'll follow up on it.

and 9.4 doesn't have a generic target.

Generic targets are for the case where one decides that performance isn't essential (or ideal performance cannot be achieved anyway), and it just needs to work and get up and running fast (= easy to distribute). In those cases, building only 2 targets for GFX10 instead of 11 is a huge time and space saver.

It makes little to no sense for accelerators (e.g; gfx90a) to have generic targets for multiple reasons:

  • They're designed for max performance on compute workloads. Performance >>> build times here.
  • It's pretty much impossible to make a generic target for, e.g. gfx9 that includes MI. The biggest issue IIRC is that the memory model is radically different between consumer and MI gfx9.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GFX 9 generic target doesn't follow this workaround absorbing convention

What do you mean? We should have all workarounds included in gfx9-generic, if not, that's an issue. Can you please tell me which workaround you think is missing? I'll follow up on it.

Thanks!
On a further read I was tripped up by the workarounds being in FeatureISAVersion9_0_Consumer_Common which is included in FeatureISAVersion9_Generic, rather than directly being in FeatureISAVersion9_Generic.
Your explanation is helpful to understand that "gfx9-generic" basic means "gfx9-consumer-generic"?
i.e. FeatureISAVersion9_0_Consumer_Common = FeatureISAVersion9_Generic = "gfx9-generic".
With the MI parts not being in this target, but still apply some of the same workarounds.

I had also forgotten binaries are built for multiple targets in the compute chain; hence why you don't want more targets.

FeatureRequiredExportPriority])>;

def FeatureISAVersion11_5_1 : FeatureSet<
!listconcat(FeatureISAVersion11_Common.Features,
[FeatureSALUFloatInsts,
FeatureDPPSrc1SGPR,
FeatureVGPRSingleUseHintInsts,
Feature1_5xVGPRs])>;
Feature1_5xVGPRs,
FeatureRequiredExportPriority])>;

def FeatureISAVersion11_5_2 : FeatureSet<
!listconcat(FeatureISAVersion11_Common.Features,
[FeatureSALUFloatInsts,
FeatureDPPSrc1SGPR,
FeatureVGPRSingleUseHintInsts])>;
FeatureVGPRSingleUseHintInsts,
FeatureRequiredExportPriority])>;

def FeatureISAVersion12 : FeatureSet<
[FeatureGFX12,
Expand Down
112 changes: 112 additions & 0 deletions llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
#include "GCNSubtarget.h"
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
#include "SIMachineFunctionInfo.h"
#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/ScheduleDAG.h"
#include "llvm/TargetParser/TargetParser.h"
Expand Down Expand Up @@ -1104,6 +1105,7 @@ void GCNHazardRecognizer::fixHazards(MachineInstr *MI) {
fixWMMAHazards(MI);
fixShift64HighRegBug(MI);
fixVALUMaskWriteHazard(MI);
fixRequiredExportPriority(MI);
}

bool GCNHazardRecognizer::fixVcmpxPermlaneHazards(MachineInstr *MI) {
Expand Down Expand Up @@ -2895,3 +2897,113 @@ bool GCNHazardRecognizer::fixVALUMaskWriteHazard(MachineInstr *MI) {

return true;
}

static bool ensureEntrySetPrio(MachineFunction *MF, int Priority,
const SIInstrInfo &TII) {
MachineBasicBlock &EntryMBB = MF->front();
if (EntryMBB.begin() != EntryMBB.end()) {
auto &EntryMI = *EntryMBB.begin();
if (EntryMI.getOpcode() == AMDGPU::S_SETPRIO &&
EntryMI.getOperand(0).getImm() >= Priority)
return false;
}

BuildMI(EntryMBB, EntryMBB.begin(), DebugLoc(), TII.get(AMDGPU::S_SETPRIO))
.addImm(Priority);
return true;
}

bool GCNHazardRecognizer::fixRequiredExportPriority(MachineInstr *MI) {
if (!ST.hasRequiredExportPriority())
return false;

// Assume the following shader types will never have exports,
// and avoid adding or adjusting S_SETPRIO.
MachineBasicBlock *MBB = MI->getParent();
MachineFunction *MF = MBB->getParent();
auto CC = MF->getFunction().getCallingConv();
switch (CC) {
case CallingConv::AMDGPU_CS:
case CallingConv::AMDGPU_CS_Chain:
case CallingConv::AMDGPU_CS_ChainPreserve:
case CallingConv::AMDGPU_KERNEL:
return false;
default:
break;
}

const int MaxPriority = 3;
const int NormalPriority = 2;
const int PostExportPriority = 0;

auto It = MI->getIterator();
switch (MI->getOpcode()) {
case AMDGPU::S_ENDPGM:
case AMDGPU::S_ENDPGM_SAVED:
case AMDGPU::S_ENDPGM_ORDERED_PS_DONE:
case AMDGPU::SI_RETURN_TO_EPILOG:
// Ensure shader with calls raises priority at entry.
// This ensures correct priority if exports exist in callee.
if (MF->getFrameInfo().hasCalls())
return ensureEntrySetPrio(MF, NormalPriority, TII);
return false;
case AMDGPU::S_SETPRIO: {
// Raise minimum priority unless in workaround.
auto &PrioOp = MI->getOperand(0);
int Prio = PrioOp.getImm();
bool InWA = (Prio == PostExportPriority) &&
(It != MBB->begin() && TII.isEXP(*std::prev(It)));
if (InWA || Prio >= NormalPriority)
return false;
PrioOp.setImm(std::min(Prio + NormalPriority, MaxPriority));
return true;
}
default:
if (!TII.isEXP(*MI))
return false;
break;
}

// Check entry priority at each export (as there will only be a few).
// Note: amdgpu_gfx can only be a callee, so defer to caller setprio.
bool Changed = false;
if (CC != CallingConv::AMDGPU_Gfx)
Changed = ensureEntrySetPrio(MF, NormalPriority, TII);

auto NextMI = std::next(It);
bool EndOfShader = false;
if (NextMI != MBB->end()) {
// Only need WA at end of sequence of exports.
if (TII.isEXP(*NextMI))
return Changed;
// Assume appropriate S_SETPRIO after export means WA already applied.
if (NextMI->getOpcode() == AMDGPU::S_SETPRIO &&
NextMI->getOperand(0).getImm() == PostExportPriority)
return Changed;
EndOfShader = NextMI->getOpcode() == AMDGPU::S_ENDPGM;
}

const DebugLoc &DL = MI->getDebugLoc();

// Lower priority.
BuildMI(*MBB, NextMI, DL, TII.get(AMDGPU::S_SETPRIO))
.addImm(PostExportPriority);

if (!EndOfShader) {
// Wait for exports to complete.
BuildMI(*MBB, NextMI, DL, TII.get(AMDGPU::S_WAITCNT_EXPCNT))
.addReg(AMDGPU::SGPR_NULL)
.addImm(0);
}

BuildMI(*MBB, NextMI, DL, TII.get(AMDGPU::S_NOP)).addImm(0);
BuildMI(*MBB, NextMI, DL, TII.get(AMDGPU::S_NOP)).addImm(0);

if (!EndOfShader) {
// Return to normal (higher) priority.
BuildMI(*MBB, NextMI, DL, TII.get(AMDGPU::S_SETPRIO))
.addImm(NormalPriority);
}

return true;
}
1 change: 1 addition & 0 deletions llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ class GCNHazardRecognizer final : public ScheduleHazardRecognizer {
bool fixWMMAHazards(MachineInstr *MI);
bool fixShift64HighRegBug(MachineInstr *MI);
bool fixVALUMaskWriteHazard(MachineInstr *MI);
bool fixRequiredExportPriority(MachineInstr *MI);

int checkMAIHazards(MachineInstr *MI);
int checkMAIHazards908(MachineInstr *MI);
Expand Down
3 changes: 3 additions & 0 deletions llvm/lib/Target/AMDGPU/GCNSubtarget.h
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,7 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
bool HasVOPDInsts = false;
bool HasVALUTransUseHazard = false;
bool HasForceStoreSC0SC1 = false;
bool HasRequiredExportPriority = false;

bool RequiresCOV6 = false;

Expand Down Expand Up @@ -1282,6 +1283,8 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,

bool hasRestrictedSOffset() const { return HasRestrictedSOffset; }

bool hasRequiredExportPriority() const { return HasRequiredExportPriority; }

/// \returns true if the target uses LOADcnt/SAMPLEcnt/BVHcnt, DScnt/KMcnt
/// and STOREcnt rather than VMcnt, LGKMcnt and VScnt respectively.
bool hasExtendedWaitCounts() const { return getGeneration() >= GFX12; }
Expand Down
Loading
Loading