-
Notifications
You must be signed in to change notification settings - Fork 5k
JIT doesn't optimize away redundant checks for Span.Length #9608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@dotnet/jit-contrib |
I investigated this (the second case with local copies). When I tried to repro the issue, I wasn't getting the call to Span.CopyTo inlined, the jit thought the inline was unprofitable. So I added [AggressiveInlining] to Span.CopyTo and then got the same codegen as reported above. The reason the jit doesn't remove the redundant check is that the jit's assertion propagation phase only creates assertions for GT_EQ and GT_NE nodes used in jumps and in this case we have a GT_GT node. Here is the relevant comment from the code:
I prototyped an implementation that works for GT_GT nodes: With that change the second comparison, the second jump, and the code at the target of the second jumps are eliminated:
Getting this change checked in will be somewhat challenging because we have a limit on the number of assertions generated per function (256) so generating these new assertions will likely bump existing assertions from some functions leading to CQ regressions. Removing this limit is something we are considering but it may have throughput implications. |
Note that the specific condition |
Codegen on main: ; Assembly listing for method MustHaveValueBenchmark.C:DoSomething(System.Span`1[ubyte],System.Span`1[ubyte])
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 2 inlinees with PGO data; 4 single block inlinees; 1 inlinees without PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T01] ( 4, 7 ) byref -> rcx ld-addr-op single-def
; V01 arg1 [V01,T00] ( 4, 8 ) byref -> rdx ld-addr-op single-def
; V02 OutArgs [V02 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
;* V03 tmp1 [V03 ] ( 0, 0 ) int -> zero-ref "impAppendStmt"
;* V04 tmp2 [V04 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V05 tmp3 [V05 ] ( 0, 0 ) int -> zero-ref "impAppendStmt"
;* V06 tmp4 [V06 ] ( 0, 0 ) byref -> zero-ref single-def "Inlining Arg"
;* V07 tmp5 [V07 ] ( 0, 0 ) byref -> zero-ref single-def "Inlining Arg"
; V08 tmp6 [V08,T04] ( 2, 2 ) long -> rax "Inlining Arg"
;* V09 tmp7 [V09 ] ( 0, 0 ) byref -> zero-ref V15._reference(offs=0x00) P-INDEP "field V00._reference (fldOffset=0x0)"
;* V10 tmp8 [V10 ] ( 0, 0 ) int -> zero-ref V15._length(offs=0x08) P-INDEP "field V00._length (fldOffset=0x8)"
; V11 tmp9 [V11,T05] ( 2, 1.50) byref -> r8 single-def V16._reference(offs=0x00) P-INDEP "field V01._reference (fldOffset=0x0)"
; V12 tmp10 [V12,T02] ( 3, 2.50) int -> rdx V16._length(offs=0x08) P-INDEP "field V01._length (fldOffset=0x8)"
; V13 tmp11 [V13,T06] ( 2, 1 ) byref -> rcx single-def V04._reference(offs=0x00) P-INDEP "field V04._reference (fldOffset=0x0)"
;* V14 tmp12 [V14,T07] ( 0, 0 ) int -> zero-ref V04._length(offs=0x08) P-INDEP "field V04._length (fldOffset=0x8)"
;* V15 tmp13 [V15 ] ( 0, 0 ) struct (16) zero-ref "Promoted implicit byref"
;* V16 tmp14 [V16 ] ( 0, 0 ) struct (16) zero-ref "Promoted implicit byref"
; V17 cse0 [V17,T03] ( 2, 2 ) int -> rax "CSE - aggressive"
;
; Lcl frame size = 40
G_M6266_IG01:
sub rsp, 40
;; size=4 bbWeight=1 PerfScore 0.25
G_M6266_IG02:
mov r8, bword ptr [rdx]
mov edx, dword ptr [rdx+08H]
mov eax, dword ptr [rcx+08H]
cmp edx, eax
ja SHORT G_M6266_IG05
;; size=13 bbWeight=1 PerfScore 7.25
G_M6266_IG03:
mov rcx, bword ptr [rcx]
mov eax, edx
mov rdx, r8
mov r8, rax
call [System.Buffer:Memmove(byref,byref,ulong)]
nop
;; size=18 bbWeight=0.50 PerfScore 3.00
G_M6266_IG04:
add rsp, 40
ret
;; size=5 bbWeight=0.50 PerfScore 0.62
G_M6266_IG05:
mov rcx, 0xD1FFAB1E ; 'Can't copy, dest too short.'
call [System.Console:WriteLine(System.String)]
nop
;; size=17 bbWeight=0.50 PerfScore 1.75
G_M6266_IG06:
add rsp, 40
ret
;; size=5 bbWeight=0.50 PerfScore 0.62
; Total bytes of code 62, prolog size 4, PerfScore 19.70, instruction count 19, allocated bytes for code 62 (MethodHash=574ee785) for method MustHaveValueBenchmark.C:DoSomething(System.Span`1[ubyte],System.Span`1[ubyte])
; ============================================================ Looks fixed by RBO: Dominator BB02 of BB03 has relop with same liberal VN
N006 ( 6, 6) [000011] N---G--N-U- ▌ GT int <l:$240, c:$241>
N001 ( 1, 1) [000025] ----------- ├──▌ LCL_VAR int V12 tmp10 u:1 <l:$1c0, c:$200>
N005 ( 4, 4) [000117] n---G------ └──▌ IND int <l:$1c1, c:$201>
N004 ( 2, 2) [000116] -------N--- └──▌ ADD byref $181
N002 ( 1, 1) [000029] ----------- ├──▌ LCL_VAR byref V00 arg0 u:1 $80
N003 ( 1, 1) [000115] ----------- └──▌ CNS_INT long 8 $140
Redundant compare; current relop:
N003 ( 5, 4) [000041] N------N-U- ▌ GT int <l:$240, c:$242>
N001 ( 1, 1) [000033] ----------- ├──▌ LCL_VAR int V12 tmp10 u:1 <l:$1c0, c:$200>
N002 ( 3, 2) [000060] ----------- └──▌ LCL_VAR int V14 tmp12 u:1 (last use) <l:$1c1, c:$202>
Fall through successor BB03 of BB02 reaches, relop [000041] must be false
Redundant branch opt in BB03:
removing useless STMT00010 ( INL03 @ ??? ... ??? ) <- INLRT @ 0x010[E-]
N004 ( 7, 6) [000042] ----------- ▌ JTRUE void $VN.Void
N003 ( 5, 4) [000041] ----------- └──▌ CNS_INT int 0
from BB03
Conditional folded at BB03
BB03 becomes a BBJ_NONE
Compiler::optRedundantBranch removed tree:
N004 ( 7, 6) [000042] ----------- ▌ JTRUE void $VN.Void
N003 ( 5, 4) [000041] ----------- └──▌ CNS_INT int 0 |
In scenarios where a caller performs a comparison and where the callee (which has been inlined into the caller) performs that same comparison, the JIT won't consolidate the redundant checks into a single check. See examples below.
I thought maybe it had something to do with querying dest's and src's length being an indirect lookup (
[rdx+8]
or[rcx+8]
), so I also tried capturing local copies to guarantee that everything is enregistered.category:cq
theme:bounds-checks
skill-level:expert
cost:small
The text was updated successfully, but these errors were encountered: