Skip to content

[DO NOT MERGE!!!] Introduce tilesPerWarp parameter, scale preshuffling and bypassingLDS #809

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: shared/triton-gfx950-launch
Choose a base branch
from

Conversation

plognjen
Copy link

No description provided.

@@ -674,6 +654,8 @@ LinearLayout mfmaDotToLinearLayout(DotOperandEncodingAttr dotMfmaLayout,
auto kDim = dotMfmaLayout.getOpIdx() == 0 ? rank - 1 : rank - 2;
int32_t kSize = shape[kDim];
auto warpsPerCTA = mfmaLayout.getWarpsPerCTA();
auto tilesPerWarp = mfmaLayout.getTilesPerWarp();
auto tilePerWarpNonK = tilesPerWarp[kDim];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this tilesPerWarp is for mfmaLayout, which does not have "kDim". Can you be more specific here?

auto regs = mlir::triton::identityStandardND(kRegister, {1, 1}, order);
LinearLayout lanes = LinearLayout::empty();
auto kDim = dotOperandIdx == 0 ? rank - 1 : rank - 2;
auto tilePerWarpNonK = tilesPerWarp[kDim];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we are dealing with operands and there is kDim and nonKDim. If you need nonKDim, you should use tilesPerWarp[nonKDim]

@antiagainst antiagainst force-pushed the shared/triton-gfx950-launch branch from 77c00fa to a259f0a Compare May 26, 2025 17:58
@plognjen plognjen force-pushed the shared/preshuffle-scales-new branch from f259473 to 7edf97c Compare May 26, 2025 19:41
zhanglx13 and others added 2 commits May 27, 2025 07:53
- Only swap elems in getValuesFromDotOperandLayoutStruct if preshuffle
env var is set
- Only set tilesPerWarp = [2, 2] for scaledDot with mxfp4 case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants