-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Simplify codegen of mixed-type checked integer addition and subtraction #15878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify codegen of mixed-type checked integer addition and subtraction #15878
Conversation
It might be nice to put the snippets with the native Crystal equivalents directly into code comments. |
This definitely looks like an improvement, but if the issue was that the types used to perform the arithmetic were inefficient, why not just switch those out? I tried doing that and I think the resulting assembly looks even better for u8/i8/u16/i16 (I haven't tested other types). I also added a test function to the provided Godbolt sample at the end performing multiple mixed-sign, mixed-size additions to show how well LLVM can inline and combine the arithmetic operations, and it seems to be more efficient. The test code shown here performs the operation Original x86 ASM (before this MR)test_sum_of_array_old:
pushq %rax
movzwl (%rdi), %ecx
movsbl 2(%rdi), %eax
addl %ecx, %eax
movl %eax, %ecx
shll $15, %ecx
sarl $15, %ecx
cmpl %eax, %ecx
jne .LBB12_10
testl %ecx, %ecx
js .LBB12_10
movzbl 4(%rdi), %ecx
addw 6(%rdi), %cx
jb .LBB12_10
movzwl %cx, %edx
cmpl $256, %edx
jae .LBB12_10
addw %cx, %ax
jb .LBB12_10
movsbl 8(%rdi), %ecx
movzwl 10(%rdi), %esi
leal (%rcx,%rsi), %edx
addw %si, %cx
jo .LBB12_10
addl $-128, %edx
movzwl %dx, %edx
cmpl $65279, %edx
jbe .LBB12_10
movzwl %ax, %eax
movsbl %cl, %ecx
addl %eax, %ecx
movl %ecx, %eax
shll $15, %eax
sarl $15, %eax
cmpl %ecx, %eax
jne .LBB12_10
testl %eax, %eax
js .LBB12_10
popq %rcx
retq
.LBB12_10:
callq __crystal_raise_overflow Current x86 ASM (this MR)test_sum_of_array_pr:
pushq %rax
movzwl (%rdi), %ecx
movsbl 2(%rdi), %eax
xorl $32768, %ecx
addw %ax, %cx
jo .LBB10_7
movzwl 6(%rdi), %eax
movzwl %ax, %edx
cmpl $256, %edx
jae .LBB10_7
addb 4(%rdi), %al
jb .LBB10_7
xorl $32768, %ecx
movzbl %al, %eax
addw %cx, %ax
jb .LBB10_7
movsbl 8(%rdi), %ecx
movzwl 10(%rdi), %edx
movswl %cx, %ecx
addl %edx, %ecx
movsbl %cl, %edx
cmpw %cx, %dx
jne .LBB10_7
xorl $32768, %eax
addw %cx, %ax
jo .LBB10_7
xorl $32768, %eax
popq %rcx
retq
.LBB10_7:
callq __crystal_raise_overflow New x86 ASM (always cast to u64/i64)test_sum_of_array_new:
pushq %rax
movzwl (%rdi), %eax
movsbq 2(%rdi), %rcx
addq %rax, %rcx
cmpq $65536, %rcx
jae .LBB13_6
movzwl 6(%rdi), %edx
movzbl 4(%rdi), %eax
addq %rdx, %rax
cmpq $256, %rax
jae .LBB13_6
addq %rcx, %rax
cmpq $65536, %rax
jae .LBB13_6
movsbq 8(%rdi), %rdx
movswq 10(%rdi), %rcx
addq %rdx, %rcx
movsbq %cl, %rdx
cmpq %rcx, %rdx
jne .LBB13_6
addq %rcx, %rax
cmpq $65536, %rax
jae .LBB13_6
popq %rcx
retq
.LBB13_6:
callq __crystal_raise_overflow This PR: https://godbolt.org/z/cajb65ex5 I haven't tested this with bigger types such as u32 or i128, but I think we could use this to improve performance even further while keeping the codegen code as simple as before. |
Code generation for mixed-type uses of overflow-checked primitive integer addition or subtraction is currently somewhat convoluted, in particular when the receiver is signed and the argument unsigned, or vice versa. The compiler would create an intermediate integer type that has one more bit than the operands, e.g.
i9
ori33
; most LLVM targets do not like these types, and this leads to some rather unsightly LLVM IR that is hard to optimize.This PR is a complete overhaul of the way these additions and subtractions are emitted. There are now multiple specialized code paths depending on whether the two operand types have the same signedness or width. The following Crystal snippet illustrates how each code path could be expressed equivalently in terms of other primitive calls in native Crystal:
(Before and after on Compiler Explorer)
The gist here is that mixed-signedness operations are transformed into same-signedness ones by applying a bias to the first operand and switching its signedness, using a bitwise XOR. For example,
-0x80_i8..0x7F_i8
maps linearly to0x00_u8..0xFF_u8
, and vice-versa. The same-signedness arithmetic operation that follows will overflow if and only if the original operation does. The result is XOR'ed again afterwards, as the bias action is the inverse of itself. This is the trick that allows mixed-type addition or subtraction without resorting to arcane integer bit widths.