-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Updating xarch to utilize EVEX compares and blending where profitable #116983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
09f3a34
to
b577b0c
Compare
b577b0c
to
223fc9c
Compare
b1a51d9
to
d6799e5
Compare
a01a4f5
to
03653bf
Compare
03653bf
to
2e80bfe
Compare
CC. @dotnet/jit-contrib. This should be ready for review. I could split this up into 2 PRs ( This is one of the last major milestones for the embedded masking support and helps ensure that all vector sizes are getting the expected implicit lightup. |
// Return Value: | ||
// true if the node lowering instruction has a EVEX embedded masking support | ||
// | ||
bool GenTree::isEmbeddedMaskingCompatible(Compiler* comp, unsigned tgtMaskSize, CorInfoType& tgtSimdBaseJitType) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just pulling the logic from lowering to here so it can be reused in the two places that need it.
if (Lowering::IsInvariantInRange(op2, node, comp, scratchSideEffects)) | ||
{ | ||
unsigned tgtMaskSize = simdSize / genTypeSize(simdBaseType); | ||
CorInfoType tgtSimdBaseJitType = CORINFO_TYPE_UNDEF; | ||
|
||
if (op2->isEmbeddedMaskingCompatible(comp, tgtMaskSize, tgtSimdBaseJitType)) | ||
{ | ||
// We are going to utilize the embedded mask, so we don't need to rewrite. However, | ||
// we want to fixup the simdBaseJitType here since it simplifies lowering and allows | ||
// both embedded broadcast and the mask to be live simultaneously. | ||
|
||
if (tgtSimdBaseJitType != CORINFO_TYPE_UNDEF) | ||
{ | ||
op2->AsHWIntrinsic()->SetSimdBaseJitType(tgtSimdBaseJitType); | ||
} | ||
return; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the cleanest/easiest way I could think of to do this.
The general idea is that we have several transforms we want to make to LIR before containment happens (because containment complicates these transforms). However, since we're in LIR form for these nodes at this point, we need to make sure the transform is safe to do.
An alternative would be to add a pre
pass for lowering then do a separate post
pass for containment, but that feels like a bigger/bulkier/riskier change.
There are some other transforms that would be nice to move "here" longer term, like the sequential insertps
operation folding we do; or the recognizing of AND(x, NOT(y))
we do; neither of which we want to do in HIR because it breaks or massively complicates other optimizations (like folding and operation negation) that we do.
I'd be happy to make this a separate pass for .NET 11, if we feel that is better. But for .NET 10 I think this is a less risky approach.
The size regressions that are showing are primarily from cases where we decide to fallback to the non-kmask variant and it is comparing against In other words, the few regressions are namely due to #70182. We might be able to mitigate some of that by finding an existing CSE with a good VN in range or doing some other backtrack searching tricks, like we've done as workarounds for the issue; but I don't think we should block this PR on that; particularly since many important cases are improved and the |
This updates the xarch intrinsic logic to always import nodes as TYP_MASK where supported and to lower them back to the non-mask variants if no other optimizations were allowed to kick-in. This allows better overall use of the hardware for existing intrinsic code paths.