Skip to content

JIT: Cast UInt64 to Single directly during const folding #106419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Aug 20, 2024
Merged
2 changes: 1 addition & 1 deletion src/coreclr/jit/gentree.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15607,7 +15607,7 @@ GenTree* Compiler::gtFoldExprConst(GenTree* tree)

case TYP_FLOAT:
{
#if defined(TARGET_64BIT)
#ifdef TARGET_64BIT
if (tree->IsUnsigned() && (lval1 < 0))
{
f1 = FloatingPointUtils::convertUInt64ToFloat((uint64_t)lval1);
Expand Down
5 changes: 5 additions & 0 deletions src/coreclr/jit/utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2272,8 +2272,13 @@ double FloatingPointUtils::convertUInt64ToDouble(uint64_t uIntVal)

float FloatingPointUtils::convertUInt64ToFloat(uint64_t u64)
{
#ifdef TARGET_ARM64
// ARM64 supports casting directly to float
return (float)u64;
#else // !TARGET_ARM64
double d = convertUInt64ToDouble(u64);
return (float)d;
#endif // !TARGET_ARM64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this bug exists on all platforms, not just Arm64.

The code in the test is

ulong vr10 = 16105307123914158031UL;
float vr11 = 4294967295U | vr10;

This boils down to 4294967295U | 16105307123914158031UL which is 16105307124325679103

The correct float result is then 16105306574569865216.0f, or rather the raw bits 1600094603 (the DEBUG output).

The result produced by the two step conversion, however, is 16105307674081492992.0f, or rather the raw bits 1600094604 (the RELEASE output).

I would expect that all our current compilers (MSVC, Clang, and GCC) are producing correct results for return (float)u64 and we no longer need to do a two step conversion. If any are still producing incorrect results, we should log bugs against them and we'd need to hand roll an implementation that does the conversion instead (I can give reference to one if we need it, but I don't think we do).

Copy link
Member Author

@amanasifkhalid amanasifkhalid Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I tried both the one- and two-step conversion on x64, and in Debug and Release, I get 1600094604. So it looks like the MSVC/Clang/GCC codegen on the const-folding path is matching what RyuJIT currently emits in Debug for x64. For this test case, is the definition of "correct" architecture-dependent? I'm guessing there's some subtle difference in rounding behavior between ARM64 and x64?

Regardless of the answer to that, I think you're right that we can update the cast logic for all platforms. I can do that in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried both the one- and two-step conversion on x64, and in Debug and Release, I get 1600094604

Are you sure? I don't see the same behavior in godbolt: https://godbolt.org/z/s5MrTWaYM
-- note that m1 and m2 return the same result (raw bits 0x5f5f818b) while m3 returns one higher (raw bits 0x5f5f818c)

For this test case, is the definition of "correct" architecture-dependent?

No, we should be deterministic across all platforms for such a case; and correspondingly the added test should be enabled for all platforms as well, not just be Arm64 specific

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out the ulong -> float cast on x64 is represented in IR as ulong -> double -> float, while on ARM64, we combine it into one cast during morph. I'm guessing we have to enable that path for all platforms, too.

Copy link
Member

@jakobbotsch jakobbotsch Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the importer is creating ulong -> double -> float for ulong -> float conversions in IL? That sounds wrong given that those are not equivalent (as the example shows). If morph is making that transformation in the opposite direction it likewise sounds wrong.

Copy link
Member

@tannergooding tannergooding Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roslyn is likely relying on the JIT implementation for the constant folding here, not realizing it's incorrect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree the helper has downsides too. But Roslyn could avoid using it when not targeting something with it available. For example, I imagine it could emit the equivalent of

ulong x = ...;
float f = x > long.MaxValue ? (float)(-x) + (float)0x8000000000000000 : (float)(long)x;

in those cases (or whatever the right way is to do the conversion manually). Signed long -> float conversion is representable with just conv.r4, I believe.

I don't think we need to try to fix this in .NET 9, but we may want to unify the debug/release behavior on something. We should probably open an issue for more discussion about the problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to try to fix this in .NET 9, but we may want to unify the debug/release behavior on something. We should probably open an issue for more discussion about the problem.

I've opened #106646 to track this. I think ARM64 was the only platform where Debug/Release behavior could diverge, due to const-folding always doing a two-step cast and ucvtf/scvtf being able to encode ulong/long -> float casts. Should we take this PR as-is for .NET 9?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect this also repros on AVX512 capable hardware for x64.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I can repro it on my AVX512 machine, though this PR's changes seem to fix it. Here's the Debug codegen:

G_M27646_IG01:  ;; offset=0x0000
       push     rbp
       sub      rsp, 48
       lea      rbp, [rsp+0x30]
       xor      eax, eax
       mov      qword ptr [rbp-0x08], rax
       mov      dword ptr [rbp-0x0C], eax
                                                ;; size=19 bbWeight=1 PerfScore 4.00
G_M27646_IG02:  ;; offset=0x0013
       cmp      dword ptr [(reloc 0x7ffeb59917c0)], 0
       je       SHORT G_M27646_IG04
                                                ;; size=9 bbWeight=1 PerfScore 4.00
G_M27646_IG03:  ;; offset=0x001C
       call     CORINFO_HELP_DBG_IS_JUST_MY_CODE
                                                ;; size=5 bbWeight=0.50 PerfScore 0.50
G_M27646_IG04:  ;; offset=0x0021
       nop
       mov      rax, 0xDF818B7FE778AFCF
       mov      qword ptr [rbp-0x08], rax
       mov      eax, -1
       mov      eax, eax
       or       rax, qword ptr [rbp-0x08]
       vcvtusi2ss xmm0, rax
       vmovss   dword ptr [rbp-0x0C], xmm0
       vmovss   xmm0, dword ptr [rbp-0x0C]
       call     [System.BitConverter:SingleToUInt32Bits(float):uint]
       mov      dword ptr [rbp-0x10], eax
       mov      ecx, dword ptr [rbp-0x10]
       call     [System.Console:WriteLine(uint)]
       nop
       nop
                                                ;; size=62 bbWeight=1 PerfScore 22.50
G_M27646_IG05:  ;; offset=0x005F
       add      rsp, 48
       pop      rbp
       ret
                                                ;; size=6 bbWeight=1 PerfScore 1.75

And here's Release:

G_M27646_IG01:  ;; offset=0x0000
       sub      rsp, 40
                                                ;; size=4 bbWeight=1 PerfScore 0.25
G_M27646_IG02:  ;; offset=0x0004
       mov      ecx, 0x5F5F818B
       call     [System.Console:WriteLine(uint)]
       nop
                                                ;; size=12 bbWeight=1 PerfScore 3.50
G_M27646_IG03:  ;; offset=0x0010
       add      rsp, 40
       ret
                                                ;; size=5 bbWeight=1 PerfScore 1.25

Both now output 1600094603. I can try tweaking the test to only run on ARM64, or on x64 if Avx512.IsSupported is true.

}

uint64_t FloatingPointUtils::convertDoubleToUInt64(double d)
Expand Down
22 changes: 22 additions & 0 deletions src/tests/JIT/Regression/JitBlue/Runtime_106338/Runtime_106338.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

// Generated by Fuzzlyn v2.2 on 2024-08-13 00:04:04
// Run on Arm64 MacOS
// Seed: 13207615092246842583-vectort,vector64,vector128,armadvsimd,armadvsimdarm64,armaes,armarmbase,armarmbasearm64,armcrc32,armcrc32arm64,armdp,armrdm,armrdmarm64,armsha1,armsha256
// Reduced from 226.8 KiB to 0.4 KiB in 00:02:12
// Debug: Outputs 1600094603
// Release: Outputs 1600094604
using System;
using Xunit;

public class Runtime_106338
{
[Fact]
public static void TestEntryPoint()
{
ulong vr10 = 16105307123914158031UL;
float vr11 = 4294967295U | vr10;
Assert.Equal(1600094603U, BitConverter.SingleToUInt32Bits(vr11));
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<Optimize>True</Optimize>
<!-- Needed for CLRTestTargetUnsupported -->
<RequiresProcessIsolation>true</RequiresProcessIsolation>
<CLRTestTargetUnsupported Condition="'$(TargetArchitecture)' != 'arm64' or '$(RuntimeFlavor)' != 'coreclr'">true</CLRTestTargetUnsupported>
</PropertyGroup>
<ItemGroup>
<Compile Include="$(MSBuildProjectName).cs" />
</ItemGroup>
</Project>
Loading