Skip to content

[AArch64] Change cost of (s|z)ext <4 x i8> -> <4 x i32> to 2. #141029

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented May 22, 2025

Increase the cost for zext <4 x i8> -> <4 x i32> to 2 to match codegen, which currently uses 2 instructions (commonly ushll.8h, ushll.4s).

Example lowering: https://llvm.godbolt.org/z/58TEoxh4v

Increase the cost for zext <4 x i8> -> <4 x i32> to 2 to match codegen,
which currently uses 2 instructions (commonly ushll.8h, ushll.4s).

Example lowering: https://llvm.godbolt.org/z/58TEoxh4v
@llvmbot
Copy link
Member

llvmbot commented May 22, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-aarch64

Author: Florian Hahn (fhahn)

Changes

Increase the cost for zext <4 x i8> -> <4 x i32> to 2 to match codegen, which currently uses 2 instructions (commonly ushll.8h, ushll.4s).

Example lowering: https://llvm.godbolt.org/z/58TEoxh4v


Full diff: https://github.com/llvm/llvm-project/pull/141029.diff

6 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+2)
  • (modified) llvm/test/Analysis/CostModel/AArch64/arith-widening.ll (+18-18)
  • (modified) llvm/test/Analysis/CostModel/AArch64/cast.ll (+4-4)
  • (modified) llvm/test/Analysis/CostModel/AArch64/free-widening-casts.ll (+1-1)
  • (modified) llvm/test/Analysis/CostModel/AArch64/sve-cast.ll (+12-12)
  • (modified) llvm/test/Transforms/SLPVectorizer/AArch64/vecreduceadd.ll (+1-1)
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 3f10da23b3494..43d03de2f46cf 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -3161,6 +3161,8 @@ InstructionCost AArch64TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
       {ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i16, 3},
       {ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i32, 2},
       {ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i32, 2},
+      {ISD::SIGN_EXTEND, MVT::v4i32, MVT::v4i8, 2},
+      {ISD::ZERO_EXTEND, MVT::v4i32, MVT::v4i8, 2},
       {ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i8, 3},
       {ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i8, 3},
       {ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i16, 2},
diff --git a/llvm/test/Analysis/CostModel/AArch64/arith-widening.ll b/llvm/test/Analysis/CostModel/AArch64/arith-widening.ll
index 7e1588f427be4..d1b93667c5506 100644
--- a/llvm/test/Analysis/CostModel/AArch64/arith-widening.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/arith-widening.ll
@@ -293,15 +293,15 @@ define void @extaddv4(<4 x i8> %i8, <4 x i16> %i16, <4 x i32> %i32, <4 x i64> %i
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl1_8_16 = zext <4 x i8> %i8 to <4 x i16>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl2_8_16 = zext <4 x i8> %i8 to <4 x i16>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %azl_8_16 = add <4 x i16> %zl1_8_16, %zl2_8_16
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %asw_8_32 = add <4 x i32> %i32, %sw_8_32
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %asl_8_32 = add <4 x i32> %sl1_8_32, %sl2_8_32
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %azw_8_32 = add <4 x i32> %i32, %zw_8_32
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %azl_8_32 = add <4 x i32> %zl1_8_32, %zl2_8_32
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_64 = sext <4 x i8> %i8 to <4 x i64>
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %asw_8_64 = add <4 x i64> %i64, %sw_8_64
@@ -988,15 +988,15 @@ define void @extsubv4(<4 x i8> %i8, <4 x i16> %i16, <4 x i32> %i32, <4 x i64> %i
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl1_8_16 = zext <4 x i8> %i8 to <4 x i16>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl2_8_16 = zext <4 x i8> %i8 to <4 x i16>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %azl_8_16 = sub <4 x i16> %zl1_8_16, %zl2_8_16
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %asw_8_32 = sub <4 x i32> %i32, %sw_8_32
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %asl_8_32 = sub <4 x i32> %sl1_8_32, %sl2_8_32
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %azw_8_32 = sub <4 x i32> %i32, %zw_8_32
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %azl_8_32 = sub <4 x i32> %zl1_8_32, %zl2_8_32
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_64 = sext <4 x i8> %i8 to <4 x i64>
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %asw_8_64 = sub <4 x i64> %i64, %sw_8_64
@@ -1683,15 +1683,15 @@ define void @extmulv4(<4 x i8> %i8, <4 x i16> %i16, <4 x i32> %i32, <4 x i64> %i
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl1_8_16 = zext <4 x i8> %i8 to <4 x i16>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl2_8_16 = zext <4 x i8> %i8 to <4 x i16>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %azl_8_16 = mul <4 x i16> %zl1_8_16, %zl2_8_16
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %asw_8_32 = mul <4 x i32> %i32, %sw_8_32
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %asl_8_32 = mul <4 x i32> %sl1_8_32, %sl2_8_32
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %azw_8_32 = mul <4 x i32> %i32, %zw_8_32
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %azl_8_32 = mul <4 x i32> %zl1_8_32, %zl2_8_32
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_64 = sext <4 x i8> %i8 to <4 x i64>
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:28 CodeSize:1 Lat:1 SizeLat:1 for: %asw_8_64 = mul <4 x i64> %i64, %sw_8_64
diff --git a/llvm/test/Analysis/CostModel/AArch64/cast.ll b/llvm/test/Analysis/CostModel/AArch64/cast.ll
index 38bd98ffd343f..618d71b6c125d 100644
--- a/llvm/test/Analysis/CostModel/AArch64/cast.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/cast.ll
@@ -41,8 +41,8 @@ define void @ext() {
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %z2i32i64 = zext <2 x i32> undef to <2 x i64>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %s4i8i16 = sext <4 x i8> undef to <4 x i16>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %z4i8i16 = zext <4 x i8> undef to <4 x i16>
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i64 = sext <4 x i8> undef to <4 x i64>
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i64 = zext <4 x i8> undef to <4 x i64>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %s4i16i32 = sext <4 x i16> undef to <4 x i32>
@@ -883,8 +883,8 @@ define i32 @load_extends() {
 ; CHECK-NEXT:  Cost Model: Found costs of 0 for: %r11 = zext i32 %loadi32 to i64
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v0 = sext <8 x i8> %loadv8i8 to <8 x i16>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v1 = zext <8 x i8> %loadv8i8 to <8 x i16>
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v4 = sext <2 x i8> %loadv2i8 to <2 x i64>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v5 = zext <2 x i8> %loadv2i8 to <2 x i64>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v6 = sext <4 x i16> %loadv4i16 to <4 x i32>
diff --git a/llvm/test/Analysis/CostModel/AArch64/free-widening-casts.ll b/llvm/test/Analysis/CostModel/AArch64/free-widening-casts.ll
index 7595abc71a9d8..1d1a5169691cd 100644
--- a/llvm/test/Analysis/CostModel/AArch64/free-widening-casts.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/free-widening-casts.ll
@@ -582,7 +582,7 @@ define <8 x i16> @neg_dissimilar_operand_kind_0(<8 x i8> %a, <8 x i8> %b) {
 }
 
 ; COST-LABEL: neg_dissimilar_operand_kind_1
-; COST-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %tmp0 = zext <4 x i8> %a to <4 x i32>
+; COST-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %tmp0 = zext <4 x i8> %a to <4 x i32>
 ; COST-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %tmp1 = zext <4 x i16> %b to <4 x i32>
 define <4 x i32> @neg_dissimilar_operand_kind_1(<4 x i8> %a, <4 x i16> %b) {
   %tmp0 = zext <4 x i8> %a to <4 x i32>
diff --git a/llvm/test/Analysis/CostModel/AArch64/sve-cast.ll b/llvm/test/Analysis/CostModel/AArch64/sve-cast.ll
index cfb130eb5ec32..9c114ca85bd3b 100644
--- a/llvm/test/Analysis/CostModel/AArch64/sve-cast.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/sve-cast.ll
@@ -42,8 +42,8 @@ define void @ext() {
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %z2i32i64 = zext <2 x i32> undef to <2 x i64>
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %s4i8i16 = sext <4 x i8> undef to <4 x i16>
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %z4i8i16 = zext <4 x i8> undef to <4 x i16>
-; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
-; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
+; CHECK-SVE-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
+; CHECK-SVE-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i64 = sext <4 x i8> undef to <4 x i64>
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i64 = zext <4 x i8> undef to <4 x i64>
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %s4i16i32 = sext <4 x i16> undef to <4 x i32>
@@ -184,8 +184,8 @@ define void @ext() {
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %z2i32i64 = zext <2 x i32> undef to <2 x i64>
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %s4i8i16 = sext <4 x i8> undef to <4 x i16>
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %z4i8i16 = zext <4 x i8> undef to <4 x i16>
-; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
-; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
+; FIXED-MIN-256-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
+; FIXED-MIN-256-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %s4i8i64 = sext <4 x i8> undef to <4 x i64>
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %z4i8i64 = zext <4 x i8> undef to <4 x i64>
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %s4i16i32 = sext <4 x i16> undef to <4 x i32>
@@ -255,8 +255,8 @@ define void @ext() {
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %z2i32i64 = zext <2 x i32> undef to <2 x i64>
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %s4i8i16 = sext <4 x i8> undef to <4 x i16>
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %z4i8i16 = zext <4 x i8> undef to <4 x i16>
-; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
-; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
+; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
+; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %s4i8i64 = sext <4 x i8> undef to <4 x i64>
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %z4i8i64 = zext <4 x i8> undef to <4 x i64>
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %s4i16i32 = sext <4 x i16> undef to <4 x i32>
@@ -1809,8 +1809,8 @@ define i32 @load_extends() #0 {
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of 0 for: %r11 = zext i32 %loadi32 to i64
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %v0 = sext <8 x i8> %loadv8i8 to <8 x i16>
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %v1 = zext <8 x i8> %loadv8i8 to <8 x i16>
-; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
-; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
+; CHECK-SVE-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
+; CHECK-SVE-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %v4 = sext <2 x i8> %loadv2i8 to <2 x i64>
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %v5 = zext <2 x i8> %loadv2i8 to <2 x i64>
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %v6 = sext <4 x i16> %loadv4i16 to <4 x i32>
@@ -1899,8 +1899,8 @@ define i32 @load_extends() #0 {
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 0 for: %r11 = zext i32 %loadi32 to i64
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %v0 = sext <8 x i8> %loadv8i8 to <8 x i16>
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %v1 = zext <8 x i8> %loadv8i8 to <8 x i16>
-; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
-; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
+; FIXED-MIN-256-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
+; FIXED-MIN-256-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %v4 = sext <2 x i8> %loadv2i8 to <2 x i64>
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %v5 = zext <2 x i8> %loadv2i8 to <2 x i64>
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %v6 = sext <4 x i16> %loadv4i16 to <4 x i32>
@@ -1944,8 +1944,8 @@ define i32 @load_extends() #0 {
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 0 for: %r11 = zext i32 %loadi32 to i64
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %v0 = sext <8 x i8> %loadv8i8 to <8 x i16>
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %v1 = zext <8 x i8> %loadv8i8 to <8 x i16>
-; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
-; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
+; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
+; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %v4 = sext <2 x i8> %loadv2i8 to <2 x i64>
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %v5 = zext <2 x i8> %loadv2i8 to <2 x i64>
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %v6 = sext <4 x i16> %loadv4i16 to <4 x i32>
diff --git a/llvm/test/Transforms/SLPVectorizer/AArch64/vecreduceadd.ll b/llvm/test/Transforms/SLPVectorizer/AArch64/vecreduceadd.ll
index 36826eb6681c8..4ef5221e93147 100644
--- a/llvm/test/Transforms/SLPVectorizer/AArch64/vecreduceadd.ll
+++ b/llvm/test/Transforms/SLPVectorizer/AArch64/vecreduceadd.ll
@@ -883,7 +883,7 @@ entry:
 
 
 ; COST-LABEL: Function:  mla_v4i8_i32
-; COST: Cost:            '-6'
+; COST: Cost:            '-4'
 define i32 @mla_v4i8_i32(ptr %x, ptr %y) "target-features"="+dotprod" {
 ; CHECK-LABEL: @mla_v4i8_i32(
 ; CHECK-NEXT:  entry:

@llvmbot
Copy link
Member

llvmbot commented May 22, 2025

@llvm/pr-subscribers-llvm-analysis

Author: Florian Hahn (fhahn)

Changes

Increase the cost for zext <4 x i8> -> <4 x i32> to 2 to match codegen, which currently uses 2 instructions (commonly ushll.8h, ushll.4s).

Example lowering: https://llvm.godbolt.org/z/58TEoxh4v


Full diff: https://github.com/llvm/llvm-project/pull/141029.diff

6 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+2)
  • (modified) llvm/test/Analysis/CostModel/AArch64/arith-widening.ll (+18-18)
  • (modified) llvm/test/Analysis/CostModel/AArch64/cast.ll (+4-4)
  • (modified) llvm/test/Analysis/CostModel/AArch64/free-widening-casts.ll (+1-1)
  • (modified) llvm/test/Analysis/CostModel/AArch64/sve-cast.ll (+12-12)
  • (modified) llvm/test/Transforms/SLPVectorizer/AArch64/vecreduceadd.ll (+1-1)
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 3f10da23b3494..43d03de2f46cf 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -3161,6 +3161,8 @@ InstructionCost AArch64TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
       {ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i16, 3},
       {ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i32, 2},
       {ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i32, 2},
+      {ISD::SIGN_EXTEND, MVT::v4i32, MVT::v4i8, 2},
+      {ISD::ZERO_EXTEND, MVT::v4i32, MVT::v4i8, 2},
       {ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i8, 3},
       {ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i8, 3},
       {ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i16, 2},
diff --git a/llvm/test/Analysis/CostModel/AArch64/arith-widening.ll b/llvm/test/Analysis/CostModel/AArch64/arith-widening.ll
index 7e1588f427be4..d1b93667c5506 100644
--- a/llvm/test/Analysis/CostModel/AArch64/arith-widening.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/arith-widening.ll
@@ -293,15 +293,15 @@ define void @extaddv4(<4 x i8> %i8, <4 x i16> %i16, <4 x i32> %i32, <4 x i64> %i
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl1_8_16 = zext <4 x i8> %i8 to <4 x i16>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl2_8_16 = zext <4 x i8> %i8 to <4 x i16>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %azl_8_16 = add <4 x i16> %zl1_8_16, %zl2_8_16
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %asw_8_32 = add <4 x i32> %i32, %sw_8_32
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %asl_8_32 = add <4 x i32> %sl1_8_32, %sl2_8_32
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %azw_8_32 = add <4 x i32> %i32, %zw_8_32
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %azl_8_32 = add <4 x i32> %zl1_8_32, %zl2_8_32
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_64 = sext <4 x i8> %i8 to <4 x i64>
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %asw_8_64 = add <4 x i64> %i64, %sw_8_64
@@ -988,15 +988,15 @@ define void @extsubv4(<4 x i8> %i8, <4 x i16> %i16, <4 x i32> %i32, <4 x i64> %i
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl1_8_16 = zext <4 x i8> %i8 to <4 x i16>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl2_8_16 = zext <4 x i8> %i8 to <4 x i16>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %azl_8_16 = sub <4 x i16> %zl1_8_16, %zl2_8_16
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %asw_8_32 = sub <4 x i32> %i32, %sw_8_32
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %asl_8_32 = sub <4 x i32> %sl1_8_32, %sl2_8_32
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %azw_8_32 = sub <4 x i32> %i32, %zw_8_32
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %azl_8_32 = sub <4 x i32> %zl1_8_32, %zl2_8_32
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_64 = sext <4 x i8> %i8 to <4 x i64>
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %asw_8_64 = sub <4 x i64> %i64, %sw_8_64
@@ -1683,15 +1683,15 @@ define void @extmulv4(<4 x i8> %i8, <4 x i16> %i16, <4 x i32> %i32, <4 x i64> %i
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl1_8_16 = zext <4 x i8> %i8 to <4 x i16>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl2_8_16 = zext <4 x i8> %i8 to <4 x i16>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %azl_8_16 = mul <4 x i16> %zl1_8_16, %zl2_8_16
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %asw_8_32 = mul <4 x i32> %i32, %sw_8_32
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %asl_8_32 = mul <4 x i32> %sl1_8_32, %sl2_8_32
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %azw_8_32 = mul <4 x i32> %i32, %zw_8_32
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %azl_8_32 = mul <4 x i32> %zl1_8_32, %zl2_8_32
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_64 = sext <4 x i8> %i8 to <4 x i64>
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:28 CodeSize:1 Lat:1 SizeLat:1 for: %asw_8_64 = mul <4 x i64> %i64, %sw_8_64
diff --git a/llvm/test/Analysis/CostModel/AArch64/cast.ll b/llvm/test/Analysis/CostModel/AArch64/cast.ll
index 38bd98ffd343f..618d71b6c125d 100644
--- a/llvm/test/Analysis/CostModel/AArch64/cast.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/cast.ll
@@ -41,8 +41,8 @@ define void @ext() {
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %z2i32i64 = zext <2 x i32> undef to <2 x i64>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %s4i8i16 = sext <4 x i8> undef to <4 x i16>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %z4i8i16 = zext <4 x i8> undef to <4 x i16>
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i64 = sext <4 x i8> undef to <4 x i64>
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i64 = zext <4 x i8> undef to <4 x i64>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %s4i16i32 = sext <4 x i16> undef to <4 x i32>
@@ -883,8 +883,8 @@ define i32 @load_extends() {
 ; CHECK-NEXT:  Cost Model: Found costs of 0 for: %r11 = zext i32 %loadi32 to i64
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v0 = sext <8 x i8> %loadv8i8 to <8 x i16>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v1 = zext <8 x i8> %loadv8i8 to <8 x i16>
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v4 = sext <2 x i8> %loadv2i8 to <2 x i64>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v5 = zext <2 x i8> %loadv2i8 to <2 x i64>
 ; CHECK-NEXT:  Cost Model: Found costs of 1 for: %v6 = sext <4 x i16> %loadv4i16 to <4 x i32>
diff --git a/llvm/test/Analysis/CostModel/AArch64/free-widening-casts.ll b/llvm/test/Analysis/CostModel/AArch64/free-widening-casts.ll
index 7595abc71a9d8..1d1a5169691cd 100644
--- a/llvm/test/Analysis/CostModel/AArch64/free-widening-casts.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/free-widening-casts.ll
@@ -582,7 +582,7 @@ define <8 x i16> @neg_dissimilar_operand_kind_0(<8 x i8> %a, <8 x i8> %b) {
 }
 
 ; COST-LABEL: neg_dissimilar_operand_kind_1
-; COST-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %tmp0 = zext <4 x i8> %a to <4 x i32>
+; COST-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %tmp0 = zext <4 x i8> %a to <4 x i32>
 ; COST-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %tmp1 = zext <4 x i16> %b to <4 x i32>
 define <4 x i32> @neg_dissimilar_operand_kind_1(<4 x i8> %a, <4 x i16> %b) {
   %tmp0 = zext <4 x i8> %a to <4 x i32>
diff --git a/llvm/test/Analysis/CostModel/AArch64/sve-cast.ll b/llvm/test/Analysis/CostModel/AArch64/sve-cast.ll
index cfb130eb5ec32..9c114ca85bd3b 100644
--- a/llvm/test/Analysis/CostModel/AArch64/sve-cast.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/sve-cast.ll
@@ -42,8 +42,8 @@ define void @ext() {
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %z2i32i64 = zext <2 x i32> undef to <2 x i64>
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %s4i8i16 = sext <4 x i8> undef to <4 x i16>
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %z4i8i16 = zext <4 x i8> undef to <4 x i16>
-; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
-; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
+; CHECK-SVE-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
+; CHECK-SVE-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i64 = sext <4 x i8> undef to <4 x i64>
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i64 = zext <4 x i8> undef to <4 x i64>
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %s4i16i32 = sext <4 x i16> undef to <4 x i32>
@@ -184,8 +184,8 @@ define void @ext() {
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %z2i32i64 = zext <2 x i32> undef to <2 x i64>
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %s4i8i16 = sext <4 x i8> undef to <4 x i16>
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %z4i8i16 = zext <4 x i8> undef to <4 x i16>
-; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
-; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
+; FIXED-MIN-256-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
+; FIXED-MIN-256-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %s4i8i64 = sext <4 x i8> undef to <4 x i64>
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %z4i8i64 = zext <4 x i8> undef to <4 x i64>
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %s4i16i32 = sext <4 x i16> undef to <4 x i32>
@@ -255,8 +255,8 @@ define void @ext() {
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %z2i32i64 = zext <2 x i32> undef to <2 x i64>
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %s4i8i16 = sext <4 x i8> undef to <4 x i16>
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %z4i8i16 = zext <4 x i8> undef to <4 x i16>
-; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
-; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
+; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
+; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %s4i8i64 = sext <4 x i8> undef to <4 x i64>
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %z4i8i64 = zext <4 x i8> undef to <4 x i64>
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %s4i16i32 = sext <4 x i16> undef to <4 x i32>
@@ -1809,8 +1809,8 @@ define i32 @load_extends() #0 {
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of 0 for: %r11 = zext i32 %loadi32 to i64
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %v0 = sext <8 x i8> %loadv8i8 to <8 x i16>
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %v1 = zext <8 x i8> %loadv8i8 to <8 x i16>
-; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
-; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
+; CHECK-SVE-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
+; CHECK-SVE-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %v4 = sext <2 x i8> %loadv2i8 to <2 x i64>
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %v5 = zext <2 x i8> %loadv2i8 to <2 x i64>
 ; CHECK-SVE-NEXT:  Cost Model: Found costs of 1 for: %v6 = sext <4 x i16> %loadv4i16 to <4 x i32>
@@ -1899,8 +1899,8 @@ define i32 @load_extends() #0 {
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 0 for: %r11 = zext i32 %loadi32 to i64
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %v0 = sext <8 x i8> %loadv8i8 to <8 x i16>
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %v1 = zext <8 x i8> %loadv8i8 to <8 x i16>
-; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
-; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
+; FIXED-MIN-256-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
+; FIXED-MIN-256-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %v4 = sext <2 x i8> %loadv2i8 to <2 x i64>
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %v5 = zext <2 x i8> %loadv2i8 to <2 x i64>
 ; FIXED-MIN-256-NEXT:  Cost Model: Found costs of 1 for: %v6 = sext <4 x i16> %loadv4i16 to <4 x i32>
@@ -1944,8 +1944,8 @@ define i32 @load_extends() #0 {
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 0 for: %r11 = zext i32 %loadi32 to i64
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %v0 = sext <8 x i8> %loadv8i8 to <8 x i16>
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %v1 = zext <8 x i8> %loadv8i8 to <8 x i16>
-; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
-; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
+; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
+; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %v4 = sext <2 x i8> %loadv2i8 to <2 x i64>
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %v5 = zext <2 x i8> %loadv2i8 to <2 x i64>
 ; FIXED-MIN-2048-NEXT:  Cost Model: Found costs of 1 for: %v6 = sext <4 x i16> %loadv4i16 to <4 x i32>
diff --git a/llvm/test/Transforms/SLPVectorizer/AArch64/vecreduceadd.ll b/llvm/test/Transforms/SLPVectorizer/AArch64/vecreduceadd.ll
index 36826eb6681c8..4ef5221e93147 100644
--- a/llvm/test/Transforms/SLPVectorizer/AArch64/vecreduceadd.ll
+++ b/llvm/test/Transforms/SLPVectorizer/AArch64/vecreduceadd.ll
@@ -883,7 +883,7 @@ entry:
 
 
 ; COST-LABEL: Function:  mla_v4i8_i32
-; COST: Cost:            '-6'
+; COST: Cost:            '-4'
 define i32 @mla_v4i8_i32(ptr %x, ptr %y) "target-features"="+dotprod" {
 ; CHECK-LABEL: @mla_v4i8_i32(
 ; CHECK-NEXT:  entry:

Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to note separately is that the costs from the custom tables are only used for RThru costs, the others default to 1, which is not really accurate either, but that's the same for all entries in the lookup tables

@davemgreen
Copy link
Collaborator

Do you have any details about what this improves for you? I am not against it, it sounds sensible, but if you look at the cost of the load it is already 2, as it includes the cost of v4i8 load + zext to v4i16. A v4i8 is legalized to a v4i16 and the cost model somewhat attempts to account for that. In some cases the cost of will certainly be 2 though.

One thing to note separately is that the costs from the custom tables are only used for RThru costs, the others default to 1, which is not really accurate either, but that's the same for all entries in the lookup tables

I think it might have ran into problems with unrolling, when I tried it. I may give it another go, I had a big stack of patches for them that I didn't do much with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants