Skip to content

Sve: Preliminary support for agnostic VL for JIT scenarios #115948

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 131 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 96 commits
Commits
Show all changes
131 commits
Select commit Hold shift + click to select a range
d22af4f
Capture g_sve_length and compVectorTLength
kunalspathak Mar 19, 2025
41a1d05
Add InstructionSet_Vector
kunalspathak Mar 19, 2025
c7d8ede
Add CORINFO_HFA_ELEM_VECTOR_VL
kunalspathak Mar 19, 2025
926eb69
Update the type of TYP_SIMD
kunalspathak Mar 19, 2025
2b39810
Passing Vector<T> to args and returns
kunalspathak Mar 19, 2025
cf9ea60
Rename TYP_SIMD -> TYP_SIMDVL
kunalspathak Mar 19, 2025
21f364b
Fix code to save/restore upper registers of VL
kunalspathak Mar 19, 2025
7a513ed
misc changes
kunalspathak Mar 20, 2025
b1c9833
Bring TYP_SIMD32 and TYP_SIMD64 for Arm64
kunalspathak Mar 20, 2025
4f92c23
Eliminate TYP_SIMDVL
kunalspathak Mar 21, 2025
6e63a3c
basic scneario of calling args/returning args
kunalspathak Mar 21, 2025
1eb159f
returning Vectors
kunalspathak Mar 22, 2025
df7203f
fix a bug
kunalspathak Mar 22, 2025
734aba5
standalone fix to generate sve mov instead of NEON mov
kunalspathak Mar 22, 2025
a71b8de
standalone fix to generate ldr/str when emit_RR is called
kunalspathak Mar 24, 2025
2e8cfd5
Support Vector.Create
kunalspathak Mar 24, 2025
1d74f82
Do not do sve_mov for scalar variant
kunalspathak Mar 25, 2025
699d2e1
Support Vector.As
kunalspathak Mar 25, 2025
7f8ff24
Support Vector.Abs
kunalspathak Mar 25, 2025
3d19d51
Support Vector.Add
kunalspathak Mar 25, 2025
70c09f9
Introduce VariableVectorLength env variable
kunalspathak Mar 25, 2025
53df3d7
Support Vector.AndNot
kunalspathak Mar 25, 2025
b1d4ce9
Support Vector.As*
kunalspathak Mar 26, 2025
29564cb
Support Vector.BitwiseAnd/BitwiseOr
kunalspathak Mar 26, 2025
45ab7b9
Support Vector.ConvertTo*
kunalspathak Mar 26, 2025
3837693
Add CreateFalseMaskAll intrinsic
kunalspathak Mar 27, 2025
ca1675c
Temporary fix for scratch register size calculation. Need to revisit
kunalspathak Mar 28, 2025
7774e07
Fix to squash in 9542e9cd047
kunalspathak Mar 28, 2025
c170a7e
Support Vector.Equals*, GreaterThan*, LessThan*
kunalspathak Mar 28, 2025
15f0384
Support Vector.Max/MaxNative
kunalspathak Mar 28, 2025
84d7bf3
Support Vector.Min/MinNative
kunalspathak Mar 28, 2025
2dff8b8
Support Vector.MinNumber/MaxNumber
kunalspathak Mar 28, 2025
58c872c
Support Vector.IsPositive/IsNegative/IsPositiveInfinity
kunalspathak Mar 29, 2025
d6d197d
Support Vector.get_Zero/One/AllBitsSet
kunalspathak Mar 29, 2025
ad47578
Support Vector.get_Indices/Sve.Index
kunalspathak Mar 29, 2025
fafee9a
Support Vector.Multiply
kunalspathak Mar 29, 2025
b475834
Support Vector.Subtract
kunalspathak Mar 29, 2025
37a78d7
Support Vector.Divide
kunalspathak Mar 29, 2025
e9eeca6
Support Vector.op_Xor
kunalspathak Mar 29, 2025
8e90959
Support Vector.op_OnesComplement/op_UnaryNegation/op_UnaryPlus
kunalspathak Mar 31, 2025
e00d016
Support Vector.MultiplyAddEstimate
kunalspathak Mar 31, 2025
f14f792
Support Vector.IsZero/IsNaN
kunalspathak Mar 31, 2025
e976b40
Support Vector.Floor
kunalspathak Mar 31, 2025
cb68fb9
Support Vector.FusedMultiplyAdd
kunalspathak Mar 31, 2025
fe633ed
Support Vector.Ceiling
kunalspathak Mar 31, 2025
2285a07
Support Vector.Round
kunalspathak Mar 31, 2025
9bdb3b9
Support Vector.LoadVector*
kunalspathak Mar 31, 2025
5c6392c
Support Vector.Store*
kunalspathak Mar 31, 2025
bf9991c
Support Vector.WidenLower/WidenUpper
kunalspathak Mar 31, 2025
a04d52b
Support Vector.Truncate
kunalspathak Mar 31, 2025
8376fc1
Support Vector.ConditionalSelect
kunalspathak Mar 31, 2025
1cebe09
Support Vector.Create/Add Sve_DuplicateScalarToVector
kunalspathak Apr 1, 2025
c626047
Support Vector.CreateSequence/Fix Sve_Index
kunalspathak Apr 1, 2025
62a2d9f
Support Vector.LeftShift/Add Sve_ShiftLeftLogicalImm
kunalspathak Apr 2, 2025
cd17e41
Support Vector.ShiftRightLogical/RightShift Add Sve.ShiftRight*Imm
kunalspathak Apr 3, 2025
f9567fd
Support Vector.ToScalar
kunalspathak Apr 3, 2025
9145170
Support Vector.Sum
kunalspathak Apr 3, 2025
4a76f71
build errors fix
kunalspathak Apr 4, 2025
a102b6f
Make GetScalableHWIntrinsicId() to all platforms to avoid #ifdef in c…
kunalspathak Apr 4, 2025
eead7d7
For unroll strategy, continue using 16B size
kunalspathak Apr 7, 2025
6d139ee
Fix some errors for Vector_opEquality
kunalspathak Apr 7, 2025
715a2c0
Disable optimizations for unroll/memcopy, etc.
kunalspathak Apr 8, 2025
b5d4460
Add comments in runtime where correct VectorT size should be reflected
kunalspathak Apr 8, 2025
15bb8a4
Fix bug for Vector.ConvertToDouble
kunalspathak Apr 8, 2025
9e99f27
Add jit-ee GetTargetVectorLength()
kunalspathak Apr 8, 2025
a9367ad
Use MinVectorLengthForSve()
kunalspathak Apr 10, 2025
9d9b20b
Fix correct type in LSRA
kunalspathak Apr 11, 2025
8d8ba75
Introduce for now FakeVectorLength environment variable
kunalspathak Apr 12, 2025
41c7629
Convert all checks to use varTypeIsSIMDVL()
kunalspathak Apr 12, 2025
6e6cc12
Merge remote-tracking branch 'origin/main' into variable-vl-3
kunalspathak May 16, 2025
9cc2794
Merge remote-tracking branch 'origin/main' into variable-vl-3
kunalspathak May 16, 2025
c03bb1c
wip
kunalspathak May 20, 2025
8afd32a
Merge remote-tracking branch 'origin/main' into variable-vl-3
kunalspathak May 20, 2025
df8c7ab
gen.bat update
kunalspathak May 20, 2025
8ee5339
Refactor to UseSveFor*()
kunalspathak May 21, 2025
abd6e21
build failure
kunalspathak May 21, 2025
c212d25
more build failure fix
kunalspathak May 21, 2025
7b11beb
more build failure
kunalspathak May 22, 2025
5dcd5e9
Handle vector length in methodtablebuilder
kunalspathak May 22, 2025
c6c6671
simplify the logic of UseSveForVectorT
kunalspathak May 23, 2025
a4d5a9b
minor cleanup
kunalspathak May 23, 2025
e5f308f
Merge remote-tracking branch 'origin/main' into variable-vl-3
kunalspathak May 25, 2025
c2e5c23
jit format
kunalspathak May 25, 2025
decd987
Merge remote-tracking branch 'origin/main' into variable-vl-3
kunalspathak May 27, 2025
be418ae
resolve merge conflict
kunalspathak May 27, 2025
1a33102
Do some tracking of simdType
kunalspathak May 28, 2025
a5889f6
Remove constraint of vector being only 16 bytes
kunalspathak May 28, 2025
f97a198
TEMP: Enable SVE for 16B as well
kunalspathak May 28, 2025
897f474
fix bugs for using TYP_SIMD16 for SVE
kunalspathak May 28, 2025
63a31fb
fix bug for str/ldr using reserved register
kunalspathak May 29, 2025
05cfde4
Support to generate SVE for 16B too - use isScalable
kunalspathak Jun 10, 2025
a8020fa
Handle Multiply and MultiplyByScalar
kunalspathak Jun 11, 2025
f6e82cf
REVERT: Enable SVE for VectorT (for testing)
kunalspathak Jun 11, 2025
784bc9d
Merge remote-tracking branch 'origin/main' into variable-vl-3
kunalspathak Jun 11, 2025
5df58bf
merge conflict errors
kunalspathak Jun 11, 2025
1e97247
fix build errors after merge
kunalspathak Jun 11, 2025
355856d
fix linux build error
kunalspathak Jun 11, 2025
dd0d483
fix the Xor for float/double
kunalspathak Jun 11, 2025
182ba12
fix the typo for equality operator
kunalspathak Jun 12, 2025
a3e364d
Merge remote-tracking branch 'origin/main' into variable-vl-3
kunalspathak Jun 12, 2025
7841140
another build error fix
kunalspathak Jun 12, 2025
10f2530
Fix the spilling of predicate registers
kunalspathak Jun 12, 2025
47b106f
Make sure to check if retNode is HWIntrinsic
kunalspathak Jun 12, 2025
b4ca14a
Add missing break
kunalspathak Jun 12, 2025
98f0e25
fix a typo for mapping zeroextend intrinsic
kunalspathak Jun 13, 2025
53f4c81
handle Vector.Equal() and similar APIs that return Vector<T> instead …
kunalspathak Jun 13, 2025
429e32d
Merge remote-tracking branch 'origin/main' into variable-vl-3
kunalspathak Jun 14, 2025
7c26e21
fix merge conflict
kunalspathak Jun 14, 2025
ee9a4eb
fix the bad merge
kunalspathak Jun 14, 2025
dbbd311
add missing break
kunalspathak Jun 15, 2025
635148c
jit format
kunalspathak Jun 15, 2025
9e70dd0
Disable Vector's WidenUpper and WidenLower intrinsic
kunalspathak Jun 16, 2025
f482e65
Do not generate SVE if not supported
kunalspathak Jun 16, 2025
05b4d06
Changes from #116726
kunalspathak Jun 16, 2025
6a73a98
Handle cases for shift amount as Vector<T>
kunalspathak Jun 18, 2025
d523ee3
Fix Vector.ConditionalSelect
kunalspathak Jun 19, 2025
98fa9f3
Fix Multiple Vector<T> * T case
kunalspathak Jun 19, 2025
9074461
Add entry for VectorMath test in ISA
kunalspathak Jun 19, 2025
bcb7bee
Fix CreateSequence for float/double
kunalspathak Jun 19, 2025
f151c64
MUL with DuplicateScalarToVector
kunalspathak Jun 19, 2025
838ce58
Merge remote-tracking branch 'origin/main' into variable-vl-3
kunalspathak Jun 19, 2025
39374e3
fix merge conflict errors
kunalspathak Jun 19, 2025
324d241
Fix the value numbering
kunalspathak Jun 20, 2025
fc24657
disable Sve when it is not available
kunalspathak Jun 20, 2025
a997047
jit format
kunalspathak Jun 20, 2025
f10bb0b
fix the cmpOpNode return to TYP_MASK
kunalspathak Jun 24, 2025
303d7ce
Merge remote-tracking branch 'origin/main' into variable-vl-3
kunalspathak Jun 24, 2025
49a536a
fix merge conflict errors
kunalspathak Jun 24, 2025
8368b81
Merge remote-tracking branch 'origin/main' into variable-vl-3
kunalspathak Jun 24, 2025
61ed25f
fix merge conflicts
kunalspathak Jun 24, 2025
7f88033
fix parameter ordering because of bad merge conflict resolution
kunalspathak Jun 25, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions src/coreclr/inc/clrconfigvalues.h
Original file line number Diff line number Diff line change
Expand Up @@ -285,6 +285,8 @@ CONFIG_DWORD_INFO(INTERNAL_GCUseGlobalAllocationContext, W("GCUseGlobalAllocatio
///
CONFIG_DWORD_INFO(INTERNAL_JitBreakEmit, W("JitBreakEmit"), (DWORD)-1, "")
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_JitDebuggable, W("JitDebuggable"), 0, "If set, suppress JIT optimizations that make debugging code difficult")
CONFIG_DWORD_INFO(INTERNAL_UseSveForVectorT, W("UseSveForVectorT"), 1, "Prefer SVE instructions for VectorT")

#if !defined(DEBUG) && !defined(_DEBUG)
#define INTERNAL_JitEnableNoWayAssert_Default 0
#else
Expand Down
2 changes: 2 additions & 0 deletions src/coreclr/inc/corhdr.h
Original file line number Diff line number Diff line change
Expand Up @@ -1754,6 +1754,8 @@ typedef enum CorInfoHFAElemType : unsigned {
CORINFO_HFA_ELEM_DOUBLE,
CORINFO_HFA_ELEM_VECTOR64,
CORINFO_HFA_ELEM_VECTOR128,
CORINFO_HFA_ELEM_VECTOR256,
CORINFO_HFA_ELEM_VECTOR512,
Comment on lines +1757 to +1758
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem unnecessary and/or inappropriate to add.

It feels like it should be some CORINFO_HFA_ELEM_VECTOR instead to indicate it is explicitly a variable length vector, as the SVE ABI may have differing conventions/considerations and we want to identify that it is explicitly that.

} CorInfoHFAElemType;

//
Expand Down
41 changes: 23 additions & 18 deletions src/coreclr/inc/corinfoinstructionset.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,24 +25,25 @@ enum CORINFO_InstructionSet
InstructionSet_Sha1=7,
InstructionSet_Sha256=8,
InstructionSet_Atomics=9,
InstructionSet_Vector64=10,
InstructionSet_Vector128=11,
InstructionSet_Dczva=12,
InstructionSet_Rcpc=13,
InstructionSet_VectorT128=14,
InstructionSet_Rcpc2=15,
InstructionSet_Sve=16,
InstructionSet_Sve2=17,
InstructionSet_ArmBase_Arm64=18,
InstructionSet_AdvSimd_Arm64=19,
InstructionSet_Aes_Arm64=20,
InstructionSet_Crc32_Arm64=21,
InstructionSet_Dp_Arm64=22,
InstructionSet_Rdm_Arm64=23,
InstructionSet_Sha1_Arm64=24,
InstructionSet_Sha256_Arm64=25,
InstructionSet_Sve_Arm64=26,
InstructionSet_Sve2_Arm64=27,
InstructionSet_Vector=10,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? Can't it just be InstructionSet_Sve instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do map them to SVE intrinsics in impSpecialIntrinsic. The problem is when we import VectorT.Add for example, we do a lookup and we need an intrinsic to track that we are operating on size agnostic entity which is different than Vector128.Add. Today, we will just map VectorT -> Vector128 and the information whether we are operating on size agnostic or fixed 16B vector is lost during creating the GenTree nodes. I am planning to map the VectorT methods to its own corresponding VectorT intrinsics and in impSpecialIntrinsic map them to Sve equivalent.
Lot of methods map one-to-one with Sve equivalent, but there are few ones like GetElement, MultiplyByScalar, ToScalar that needs more than one API operation to achieve the result. So after importer, we should technically not see any NI_Vector* intrinsic nodes.

InstructionSet_Vector64=11,
InstructionSet_Vector128=12,
InstructionSet_Dczva=13,
InstructionSet_Rcpc=14,
InstructionSet_VectorT128=15,
InstructionSet_Rcpc2=16,
InstructionSet_Sve=17,
InstructionSet_Sve2=18,
InstructionSet_ArmBase_Arm64=19,
InstructionSet_AdvSimd_Arm64=20,
InstructionSet_Aes_Arm64=21,
InstructionSet_Crc32_Arm64=22,
InstructionSet_Dp_Arm64=23,
InstructionSet_Rdm_Arm64=24,
InstructionSet_Sha1_Arm64=25,
InstructionSet_Sha256_Arm64=26,
InstructionSet_Sve_Arm64=27,
InstructionSet_Sve2_Arm64=28,
#endif // TARGET_ARM64
#ifdef TARGET_RISCV64
InstructionSet_RiscV64Base=1,
Expand Down Expand Up @@ -379,6 +380,8 @@ inline CORINFO_InstructionSetFlags EnsureInstructionSetFlagsAreValid(CORINFO_Ins
resultflags.RemoveInstructionSet(InstructionSet_Sve);
if (resultflags.HasInstructionSet(InstructionSet_Sve2) && !resultflags.HasInstructionSet(InstructionSet_Sve))
resultflags.RemoveInstructionSet(InstructionSet_Sve2);
if (resultflags.HasInstructionSet(InstructionSet_Vector) && !resultflags.HasInstructionSet(InstructionSet_Sve))
resultflags.RemoveInstructionSet(InstructionSet_Vector);
#endif // TARGET_ARM64
#ifdef TARGET_RISCV64
if (resultflags.HasInstructionSet(InstructionSet_Zbb) && !resultflags.HasInstructionSet(InstructionSet_RiscV64Base))
Expand Down Expand Up @@ -627,6 +630,8 @@ inline const char *InstructionSetToString(CORINFO_InstructionSet instructionSet)
return "Sha256_Arm64";
case InstructionSet_Atomics :
return "Atomics";
case InstructionSet_Vector :
return "Vector";
case InstructionSet_Vector64 :
return "Vector64";
case InstructionSet_Vector128 :
Expand Down
2 changes: 2 additions & 0 deletions src/coreclr/inc/corjit.h
Original file line number Diff line number Diff line change
Expand Up @@ -438,6 +438,8 @@ class ICorJitInfo : public ICorDynamicInfo
//
virtual uint32_t getExpectedTargetArchitecture() = 0;

virtual uint32_t getTargetVectorLength() = 0;

// Fetches extended flags for a particular compilation instance. Returns
// the number of bytes written to the provided buffer.
virtual uint32_t getJitFlags(
Expand Down
2 changes: 2 additions & 0 deletions src/coreclr/inc/icorjitinfoimpl_generated.h
Original file line number Diff line number Diff line change
Expand Up @@ -744,6 +744,8 @@ uint16_t getRelocTypeHint(

uint32_t getExpectedTargetArchitecture() override;

uint32_t getTargetVectorLength() override;

uint32_t getJitFlags(
CORJIT_FLAGS* flags,
uint32_t sizeInBytes) override;
Expand Down
10 changes: 5 additions & 5 deletions src/coreclr/inc/jiteeversionguid.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,11 @@

#include <minipal/guid.h>

constexpr GUID JITEEVersionIdentifier = { /* 7a77e6d9-7280-439d-bb9d-9887b4516a86 */
0x7a77e6d9,
0x7280,
0x439d,
{0xbb, 0x9d, 0x98, 0x87, 0xb4, 0x51, 0x6a, 0x86}
constexpr GUID JITEEVersionIdentifier = { /* 49287d16-74bd-42e9-9d47-132d7a5f67eb */
0x49287d16,
0x74bd,
0x42e9,
{0x9d, 0x47, 0x13, 0x2d, 0x7a, 0x5f, 0x67, 0xeb}
};

#endif // JIT_EE_VERSIONING_GUID_H
1 change: 1 addition & 0 deletions src/coreclr/jit/ICorJitInfo_names_generated.h
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,7 @@ DEF_CLR_API(recordCallSite)
DEF_CLR_API(recordRelocation)
DEF_CLR_API(getRelocTypeHint)
DEF_CLR_API(getExpectedTargetArchitecture)
DEF_CLR_API(getTargetVectorLength)
DEF_CLR_API(getJitFlags)
DEF_CLR_API(getSpecialCopyHelper)

Expand Down
8 changes: 8 additions & 0 deletions src/coreclr/jit/ICorJitInfo_wrapper_generated.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1743,6 +1743,14 @@ uint32_t WrapICorJitInfo::getExpectedTargetArchitecture()
return temp;
}

uint32_t WrapICorJitInfo::getTargetVectorLength()
{
API_ENTER(getTargetVectorLength);
uint32_t temp = wrapHnd->getTargetVectorLength();
API_LEAVE(getTargetVectorLength);
return temp;
}

uint32_t WrapICorJitInfo::getJitFlags(
CORJIT_FLAGS* flags,
uint32_t sizeInBytes)
Expand Down
10 changes: 9 additions & 1 deletion src/coreclr/jit/abi.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,15 @@ var_types ABIPassingSegment::GetRegisterType() const
#ifdef FEATURE_SIMD
case 16:
return TYP_SIMD16;
#endif
#ifdef TARGET_ARM64
case 32:
assert(Compiler::SizeMatchesVectorTLength(Size));
return TYP_SIMD32;
case 64:
assert(Compiler::SizeMatchesVectorTLength(Size));
return TYP_SIMD64;
#endif // TARGET_ARM64
#endif // FEATURE_SIMD
default:
assert(!"Unexpected size for floating point register");
return TYP_UNDEF;
Expand Down
7 changes: 4 additions & 3 deletions src/coreclr/jit/assertionprop.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,8 @@ bool IntegralRange::Contains(int64_t value) const
// Example: IntCns = 42 gives [0..127] with a non -precise range, [42,42] with a precise range.
return {SymbolicIntegerValue::Zero, SymbolicIntegerValue::ByteMax};
#elif defined(TARGET_ARM64)
case NI_Vector_op_Equality:
case NI_Vector_op_Inequality:
case NI_Vector64_op_Equality:
case NI_Vector64_op_Inequality:
case NI_Vector128_op_Equality:
Expand Down Expand Up @@ -2983,8 +2985,7 @@ GenTree* Compiler::optVNBasedFoldConstExpr(BasicBlock* block, GenTree* parent, G
conValTree = vecCon;
break;
}

#if defined(TARGET_XARCH)
#if defined(TARGET_XARCH) || defined(TARGET_ARM64)
case TYP_SIMD32:
{
simd32_t value = vnStore->ConstantValue<simd32_t>(vnCns);
Expand All @@ -3008,7 +3009,7 @@ GenTree* Compiler::optVNBasedFoldConstExpr(BasicBlock* block, GenTree* parent, G
}
break;

#endif // TARGET_XARCH
#endif // TARGET_XARCH || TARGET_ARM64
#endif // FEATURE_SIMD

#if defined(FEATURE_MASKED_HW_INTRINSICS)
Expand Down
147 changes: 138 additions & 9 deletions src/coreclr/jit/codegenarm64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2280,6 +2280,9 @@ void CodeGen::genSetRegToConst(regNumber targetReg, var_types targetType, GenTre
{
// We ignore any differences between SIMD12 and SIMD16 here if we can broadcast the value
// via mvni/movi.
// Also, even if UseSveForVectorT == true, we will continue generating loading in V* registers
// instead of Z* registers, because their size is same if VL == 16.

const bool is8 = tree->TypeIs(TYP_SIMD8);
if (vecCon->IsAllBitsSet())
{
Expand All @@ -2298,12 +2301,12 @@ void CodeGen::genSetRegToConst(regNumber targetReg, var_types targetType, GenTre
emit->emitIns_R_I(INS_movi, attr, targetReg, val.i32[0], is8 ? INS_OPTS_2S : INS_OPTS_4S);
}
else if (ElementsAreSame(val.i16, is8 ? 4 : 8) &&
emitter::emitIns_valid_imm_for_movi(val.i16[0], EA_2BYTE))
emitter::emitIns_valid_imm_for_movi(val.i16[0], EA_2BYTE))
{
emit->emitIns_R_I(INS_movi, attr, targetReg, val.i16[0], is8 ? INS_OPTS_4H : INS_OPTS_8H);
}
else if (ElementsAreSame(val.i8, is8 ? 8 : 16) &&
emitter::emitIns_valid_imm_for_movi(val.i8[0], EA_1BYTE))
emitter::emitIns_valid_imm_for_movi(val.i8[0], EA_1BYTE))
{
emit->emitIns_R_I(INS_movi, attr, targetReg, val.i8[0], is8 ? INS_OPTS_8B : INS_OPTS_16B);
}
Expand All @@ -2329,6 +2332,92 @@ void CodeGen::genSetRegToConst(regNumber targetReg, var_types targetType, GenTre
}
break;
}
case TYP_SIMD32:
{
// Use scalable registers
if (vecCon->IsAllBitsSet())
{
// Use Scalable_B because for Ones, it doesn't matter.
emit->emitIns_R_I(INS_sve_mov, EA_SCALABLE, targetReg, -1, INS_OPTS_SCALABLE_B);
}
else if (vecCon->IsZero())
{
// Use Scalable_B because for Zero, it doesn't matter.
emit->emitIns_R_I(INS_sve_mov, EA_SCALABLE, targetReg, 0, INS_OPTS_SCALABLE_B);
}
else
{
simd32_t val = vecCon->gtSimd32Val;
if (ElementsAreSame(val.i8, 32))
{
emit->emitIns_R_I(INS_sve_dup, EA_SCALABLE, targetReg, val.i8[0], INS_OPTS_SCALABLE_B);
}
else if (ElementsAreSame(val.i16, 16))
{
emit->emitIns_R_I(INS_sve_dup, EA_SCALABLE, targetReg, val.i16[0], INS_OPTS_SCALABLE_H);
}
else if (ElementsAreSame(val.i32, 8))
{
emit->emitIns_R_I(INS_sve_dup, EA_SCALABLE, targetReg, val.i32[0], INS_OPTS_SCALABLE_S);
}
else
{
// Get a temp integer register to compute long address.
regNumber addrReg = internalRegisters.GetSingle(tree);
CORINFO_FIELD_HANDLE hnd;
hnd = emit->emitSimdConst(&vecCon->gtSimdVal, emitTypeSize(tree->TypeGet()));
emit->emitIns_R_C(INS_sve_ldr, attr, targetReg, addrReg, hnd, 0);
// emit->emitIns_R_C(INS_adr, EA_8BYTE, addrReg, REG_NA, hnd, 0);
// emit->emitIns_R_R_R_I(INS_sve_ld1b, EA_SCALABLE, targetReg, REG_P1, addrReg, 0,
// INS_OPTS_SCALABLE_B);
}
}
break;
}
case TYP_SIMD64:
{
// Use scalable registers
if (vecCon->IsAllBitsSet())
{
// Use Scalable_B because for Ones, it doesn't matter.
emit->emitIns_R_I(INS_sve_mov, EA_SCALABLE, targetReg, -1, INS_OPTS_SCALABLE_B);
}
else if (vecCon->IsZero())
{
// Use Scalable_B because for Zero, it doesn't matter.
emit->emitIns_R_I(INS_sve_mov, EA_SCALABLE, targetReg, 0, INS_OPTS_SCALABLE_B);
}
else
{
simd64_t val = vecCon->gtSimd64Val;
if (ElementsAreSame(val.i32, 16) && emitter::isValidSimm_MultipleOf<8, 256>(val.i32[0]))
{
emit->emitIns_R_I(INS_sve_mov, EA_SCALABLE, targetReg, val.i32[0], INS_OPTS_SCALABLE_S,
INS_SCALABLE_OPTS_IMM_BITMASK);
}
else if (ElementsAreSame(val.i16, 32) && emitter::isValidSimm_MultipleOf<8, 256>(val.i16[0]))
{
emit->emitIns_R_I(INS_sve_mov, EA_SCALABLE, targetReg, val.i16[0], INS_OPTS_SCALABLE_H,
INS_SCALABLE_OPTS_IMM_BITMASK);
}
else if (ElementsAreSame(val.i8, 64) && emitter::isValidSimm<8>(val.i8[0]))
{
emit->emitIns_R_I(INS_sve_mov, EA_SCALABLE, targetReg, val.i8[0], INS_OPTS_SCALABLE_B,
INS_SCALABLE_OPTS_IMM_BITMASK);
}
else
{
// Get a temp integer register to compute long address.
regNumber addrReg = internalRegisters.GetSingle(tree);
CORINFO_FIELD_HANDLE hnd;
simd64_t constValue;
memcpy(&constValue, &vecCon->gtSimdVal, sizeof(simd64_t));
hnd = emit->emitSimdConst(&vecCon->gtSimdVal, emitTypeSize(tree->TypeGet()));
emit->emitIns_R_C(INS_sve_ldr, attr, targetReg, addrReg, hnd, 0);
}
}
break;
}

default:
{
Expand Down Expand Up @@ -2955,7 +3044,18 @@ void CodeGen::genSimpleReturn(GenTree* treeNode)
}
}
emitAttr attr = emitActualTypeSize(targetType);
GetEmitter()->emitIns_Mov(INS_mov, attr, retReg, op1->GetRegNum(), /* canSkip */ !movRequired);
bool isScalable = (attr == EA_SCALABLE) || (Compiler::UseSveForType(targetType));

if (isScalable)
{
// TODO-VL: Should we check the baseType or it doesn't matter because it is just reg->reg move
GetEmitter()->emitIns_Mov(INS_sve_mov, attr, retReg, op1->GetRegNum(), /* canSkip */ !movRequired,
INS_OPTS_SCALABLE_Q);
}
else
{
GetEmitter()->emitIns_Mov(INS_mov, attr, retReg, op1->GetRegNum(), /* canSkip */ !movRequired);
}
}

/***********************************************************************************************
Expand Down Expand Up @@ -5247,14 +5347,28 @@ void CodeGen::genSimdUpperSave(GenTreeIntrinsic* node)

GenTreeLclVar* lclNode = op1->AsLclVar();
LclVarDsc* varDsc = compiler->lvaGetDesc(lclNode);
assert(emitTypeSize(varDsc->GetRegisterType(lclNode)) == 16);

regNumber tgtReg = node->GetRegNum();
assert(tgtReg != REG_NA);
unsigned varSize = emitTypeSize(varDsc->GetRegisterType(lclNode));
assert((varSize == 16) || (Compiler::SizeMatchesVectorTLength(varSize)));

regNumber op1Reg = genConsumeReg(op1);
assert(op1Reg != REG_NA);

regNumber tgtReg = node->GetRegNum();
#ifdef TARGET_ARM64
// TODO-VL: Write a helper to do this check for LclVars*, GenTree*, etc.
if (Compiler::UseStrictSveForType(op1->TypeGet()))
{
// Until we custom ABI for SVE, we will just store entire contents of Z* registers
// on stack. If we don't do it, we will need multiple free registers to save the
// contents of everything but lower 8-bytes.
assert(tgtReg == REG_NA);

GetEmitter()->emitIns_S_R(INS_sve_str, EA_SCALABLE, op1Reg, lclNode->GetLclNum(), 0);
return;
}
#endif // TARGET_ARM64
assert(tgtReg != REG_NA);

GetEmitter()->emitIns_R_R_I_I(INS_mov, EA_8BYTE, tgtReg, op1Reg, 0, 1);

if ((node->gtFlags & GTF_SPILL) != 0)
Expand Down Expand Up @@ -5303,10 +5417,12 @@ void CodeGen::genSimdUpperRestore(GenTreeIntrinsic* node)

GenTreeLclVar* lclNode = op1->AsLclVar();
LclVarDsc* varDsc = compiler->lvaGetDesc(lclNode);
assert(emitTypeSize(varDsc->GetRegisterType(lclNode)) == 16);

unsigned varSize = emitTypeSize(varDsc->GetRegisterType(lclNode));
assert((varSize == 16) || (Compiler::SizeMatchesVectorTLength(varSize)));

regNumber srcReg = node->GetRegNum();
assert(srcReg != REG_NA);
assert((srcReg != REG_NA) || (Compiler::UseStrictSveForType(node->TypeGet())));

regNumber lclVarReg = genConsumeReg(lclNode);
assert(lclVarReg != REG_NA);
Expand All @@ -5318,6 +5434,19 @@ void CodeGen::genSimdUpperRestore(GenTreeIntrinsic* node)
// The localVar must have a stack home.
assert(varDsc->lvOnFrame);

#ifdef TARGET_ARM64
// TODO-VL: Write a helper to do this check for LclVars*, GenTree*, etc.
if (Compiler::UseStrictSveForType(op1->TypeGet()))
{
// Until we custom ABI for SVE, we will just store entire contents of Z* registers
// on stack. If we don't do it, we will need multiple free registers to save the
// contents of everything but lower 8-bytes.

GetEmitter()->emitIns_R_S(INS_sve_ldr, EA_SCALABLE, lclVarReg, varNum, 0);
return;
}
#endif // TARGET_ARM64

// We will load this from the upper 8 bytes of this localVar's home.
int offset = 8;

Expand Down
Loading
Loading