Updating xarch to utilize EVEX compares and blending where profitable #116983

tannergooding · 2025-06-24T19:06:57Z

This updates the xarch intrinsic logic to always import nodes as TYP_MASK where supported and to lower them back to the non-mask variants if no other optimizations were allowed to kick-in. This allows better overall use of the hardware for existing intrinsic code paths.

dotnet-policy-service · 2025-06-24T19:07:50Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

src/coreclr/jit/lowerxarch.cpp

tannergooding · 2025-07-01T19:50:04Z

CC. @dotnet/jit-contrib. This should be ready for review. I could split this up into 2 PRs ([Allow rewriting of hwintrinsic mask ops to their non-mask forms and [Default to using mask ops for V128/V256 on supporting hardware), but I think it's worth doing these two together.

This is one of the last major milestones for the embedded masking support and helps ensure that all vector sizes are getting the expected implicit lightup.

tannergooding · 2025-07-01T19:51:38Z

src/coreclr/jit/gentree.cpp

+// Return Value:
+//   true if the node lowering instruction has a EVEX embedded masking support
+//
+bool GenTree::isEmbeddedMaskingCompatible(Compiler* comp, unsigned tgtMaskSize, CorInfoType& tgtSimdBaseJitType) const


This is just pulling the logic from lowering to here so it can be reused in the two places that need it.

tannergooding · 2025-07-01T20:03:30Z

src/coreclr/jit/rationalize.cpp

+    if (Lowering::IsInvariantInRange(op2, node, comp, scratchSideEffects))
+    {
+        unsigned    tgtMaskSize        = simdSize / genTypeSize(simdBaseType);
+        CorInfoType tgtSimdBaseJitType = CORINFO_TYPE_UNDEF;
+
+        if (op2->isEmbeddedMaskingCompatible(comp, tgtMaskSize, tgtSimdBaseJitType))
+        {
+            // We are going to utilize the embedded mask, so we don't need to rewrite. However,
+            // we want to fixup the simdBaseJitType here since it simplifies lowering and allows
+            // both embedded broadcast and the mask to be live simultaneously.
+
+            if (tgtSimdBaseJitType != CORINFO_TYPE_UNDEF)
+            {
+                op2->AsHWIntrinsic()->SetSimdBaseJitType(tgtSimdBaseJitType);
+            }
+            return;
+        }
+    }


This was the cleanest/easiest way I could think of to do this.

The general idea is that we have several transforms we want to make to LIR before containment happens (because containment complicates these transforms). However, since we're in LIR form for these nodes at this point, we need to make sure the transform is safe to do.

An alternative would be to add a pre pass for lowering then do a separate post pass for containment, but that feels like a bigger/bulkier/riskier change.

There are some other transforms that would be nice to move "here" longer term, like the sequential insertps operation folding we do; or the recognizing of AND(x, NOT(y)) we do; neither of which we want to do in HIR because it breaks or massively complicates other optimizations (like folding and operation negation) that we do.

I'd be happy to make this a separate pass for .NET 11, if we feel that is better. But for .NET 10 I think this is a less risky approach.

tannergooding · 2025-07-01T21:05:58Z

The size regressions that are showing are primarily from cases where we decide to fallback to the non-kmask variant and it is comparing against zero. Such cases are now having to emit an extra vxorps since it isn't CSE'd.

In other words, the few regressions are namely due to #70182. We might be able to mitigate some of that by finding an existing CSE with a good VN in range or doing some other backtrack searching tricks, like we've done as workarounds for the issue; but I don't think we should block this PR on that; particularly since many important cases are improved and the vxorps are elided by the register renamer since its just producing 0.

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 24, 2025

dotnet-policy-service bot assigned tannergooding Jun 24, 2025

tannergooding added the NO-REVIEW Experimental/testing PR, do NOT review it label Jun 24, 2025

tannergooding commented Jun 24, 2025

View reviewed changes

src/coreclr/jit/lowerxarch.cpp Outdated Show resolved Hide resolved

This was referenced Jun 25, 2025

[linux-x64] [mono-aot] Test Runtime_101731.TestConvertToInt64NativeSingle(3.4028235E+38) returns exit code 22 #112557

Open

browser-wasm Windows build error #116746

Open

tannergooding force-pushed the small-evex-mask branch 4 times, most recently from 09f3a34 to b577b0c Compare June 25, 2025 08:10

build-analysis bot mentioned this pull request Jun 25, 2025

LibraryImportGenerator.Unit.Tests crashing on linux-x64 mono interpreter #100800

Open

tannergooding force-pushed the small-evex-mask branch from b577b0c to 223fc9c Compare June 25, 2025 18:28

build-analysis bot mentioned this pull request Jun 25, 2025

[9.0] Test assert failure in X509Certificates.Tests.RevocationTests.AiaTests.AiaAcceptsCertTypesAndIgnoresNonCertTypes #107364

Open

tannergooding force-pushed the small-evex-mask branch 3 times, most recently from b1a51d9 to d6799e5 Compare June 26, 2025 02:29

tannergooding force-pushed the small-evex-mask branch 9 times, most recently from a01a4f5 to 03653bf Compare June 28, 2025 06:26

tannergooding added 3 commits July 1, 2025 12:47

Extract the isEmbeddedMaskingCompatible check to its own method

7ad0573

Allow rewriting of hwintrinsic mask ops to their non-mask forms

16bbc22

Default to using mask ops for V128/V256 on supporting hardware

2e80bfe

tannergooding force-pushed the small-evex-mask branch from 03653bf to 2e80bfe Compare July 1, 2025 19:47

tannergooding removed the NO-REVIEW Experimental/testing PR, do NOT review it label Jul 1, 2025

tannergooding marked this pull request as ready for review July 1, 2025 19:47

tannergooding requested a review from EgorBo July 1, 2025 19:50

tannergooding commented Jul 1, 2025

View reviewed changes

Merge branch 'main' into small-evex-mask

ba31dc0

This was referenced Jul 4, 2025

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Updating xarch to utilize EVEX compares and blending where profitable #116983

Updating xarch to utilize EVEX compares and blending where profitable #116983

Uh oh!

tannergooding commented Jun 24, 2025 •

edited

Loading

Uh oh!

dotnet-policy-service bot commented Jun 24, 2025

Uh oh!

Uh oh!

tannergooding commented Jul 1, 2025

Uh oh!

tannergooding Jul 1, 2025

Uh oh!

tannergooding Jul 1, 2025

Uh oh!

tannergooding commented Jul 1, 2025

Uh oh!

Uh oh!

Updating xarch to utilize EVEX compares and blending where profitable #116983

Are you sure you want to change the base?

Updating xarch to utilize EVEX compares and blending where profitable #116983

Uh oh!

Conversation

tannergooding commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service bot commented Jun 24, 2025

Uh oh!

Uh oh!

tannergooding commented Jul 1, 2025

Uh oh!

tannergooding Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

tannergooding Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

tannergooding commented Jul 1, 2025

Uh oh!

Uh oh!

tannergooding commented Jun 24, 2025 •

edited

Loading