Skip to content

[SystemZ] Bad Codegen for vec_gfmsum_accum_128 #109113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
saitama951 opened this issue Sep 18, 2024 · 7 comments
Closed

[SystemZ] Bad Codegen for vec_gfmsum_accum_128 #109113

saitama951 opened this issue Sep 18, 2024 · 7 comments
Assignees
Labels
backend:SystemZ clang:headers Headers provided by Clang, e.g. for intrinsics miscompilation release:backport

Comments

@saitama951
Copy link

saitama951 commented Sep 18, 2024

Recently the dotnet team started seeing Compression test case failures. These tests fail when zlib-ng is compiled with clang.
on deeper inspection I have managed to extract a test program from zlib-ng (Link)

#include "stdio.h"
#include <vecintrin.h>
typedef unsigned char uv16qi __attribute__((vector_size(16)));
typedef unsigned int uv4si __attribute__((vector_size(16)));
typedef unsigned long long uv2di __attribute__((vector_size(16)));

int main()
{
        const uv2di r2r1 = {0x1C6E41596, 0x154442BD4};
        uv2di v1 = {7381244131595332141, 2315514454429938015};
        uv16qi part1 = {97, 116, 115, 32, 109, 111, 114, 102, 32, 115, 101, 110, 105, 108, 32, 100};
        uv2di result = (uv2di)vec_gfmsum_accum_128(r2r1, v1, part1);
        printf("value 1: %llu\n value 2: %llu\n", result[0],result[1]);
        return 0;
}

The results are as follows with gcc and clang:

[sanjam@s390x ~]$ gcc -g bug-clang.c -march=z15 -mzvector -o b.out
[sanjam@s390x ~]$ ./b.out 
value 1: 7022364300429628393
 value 2: 4831923049869144086
 
 
[sanjam@s390x ~]$ clang -g bug-clang.c -march=z15 -fzvector
[sanjam@s390x ~]$ ./a.out 
value 1: 1591483802437686806
 value 2: 1591483802437686806

now on inspecting the disassembly I see this:

static inline __ATTRS_o_ai __vector unsigned char
vec_gfmsum_accum_128(__vector unsigned long long __a,
                     __vector unsigned long long __b,
                     __vector unsigned char __c) {
  return (__vector unsigned char)
         __builtin_s390_vgfmag(__a, __b, (unsigned __int128)__c);
    1258:       e7 00 b1 08 30 06       vl      %v0,264(%r11),3
    125e:       e7 10 b0 f8 30 06       vl      %v1,248(%r11),3
    1264:       e7 20 b0 e8 30 06       vl      %v2,232(%r11),3
    126a:       e7 00 13 00 20 bc       vgfmag  %v0,%v0,%v1,%v2
    1270:       e7 00 00 03 20 21       vlgvf   %r0,%v0,3
  return (__vector unsigned char)
    1276:       e7 00 00 00 00 62       vlvgp   %v0,%r0,%r0
    127c:       e7 00 00 07 00 4d       vrepb   %v0,%v0,7
    1282:       e7 00 b0 a0 30 0e       vst     %v0,160(%r11),3

here the sequence

    1270:       e7 00 00 03 20 21       vlgvf   %r0,%v0,3
    1276:       e7 00 00 00 00 62       vlvgp   %v0,%r0,%r0
    127c:       e7 00 00 07 00 4d       vrepb   %v0,%v0,7

looks strange, I believe this should be directly doing a vst after the vgfmag?

clang version:

clang version 20.0.0git (https://github.com/llvm/llvm-project a26ec542371652e1d774696e90016fd5b0b1c191)
Target: s390x-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/sanjam/llvm-project/build/bin

cc: @uweigand @JonPsson

@uweigand
Copy link
Member

Good catch, this does indeed look wrong. It appears I introduced a regression here: #74625 due to a misinterpretation of the type casting rules:

static inline __ATTRS_o_ai __vector unsigned char
vec_gfmsum_accum_128(__vector unsigned long long __a,
                     __vector unsigned long long __b,
                     __vector unsigned char __c) {
  return (__vector unsigned char)
         __builtin_s390_vgfmag(__a, __b, (unsigned __int128)__c);
}

I assumed that casts between a vector and a scalar of the same size (i.e. any Z vector type and __int128) would be just a bitcast. However, this is only true for the vector -> int128 conversion. A int128 -> vector conversion actually triggers the AltiVec splat rule, which is not what we want here.

I'll come up with a fix.

@uweigand uweigand self-assigned this Sep 18, 2024
@EugeneZelenko EugeneZelenko added the clang:headers Headers provided by Clang, e.g. for intrinsics label Sep 19, 2024
tmsri pushed a commit to tmsri/llvm-project that referenced this issue Sep 19, 2024
PR llvm#74625 introduced a regression in the code generated for the
following set of intrinsic:
  vec_add_u128, vec_addc_u128, vec_adde_u128, vec_addec_u128
  vec_sub_u128, vec_subc_u128, vec_sube_u128, vec_subec_u128
  vec_sum_u128, vec_msum_u128
  vec_gfmsum_128, vec_gfmsum_accum_128

This is because the new code incorrectly assumed that a cast
from "unsigned __int128" to "vector unsigned char" would simply
be a bitcast re-interpretation; instead, this cast actually
truncates the __int128 to char and splats the result.

Fixed by adding an intermediate cast via a single-element
128-bit integer vector.

Fixes: llvm#109113
@johnplatts
Copy link

I had raised a similar issue (involving the same bug in the Clang vecintrin.h header) in issue #102911.

@uweigand
Copy link
Member

uweigand commented Oct 7, 2024

I had raised a similar issue (involving the same bug in the Clang vecintrin.h header) in issue #102911.

Ah, sorry, I didn't see that. Do you still need a backport to LLVM 19? Note that LLVM 18 unfortunately also has the same bug.

@uweigand
Copy link
Member

uweigand commented Oct 7, 2024

/cherry-pick baf9b7d

@llvmbot
Copy link
Member

llvmbot commented Oct 7, 2024

/cherry-pick baf9b7d

Error: Command failed due to missing milestone.

@uweigand
Copy link
Member

uweigand commented Oct 7, 2024

/cherry-pick baf9b7d

llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Oct 7, 2024
PR llvm#74625 introduced a regression in the code generated for the
following set of intrinsic:
  vec_add_u128, vec_addc_u128, vec_adde_u128, vec_addec_u128
  vec_sub_u128, vec_subc_u128, vec_sube_u128, vec_subec_u128
  vec_sum_u128, vec_msum_u128
  vec_gfmsum_128, vec_gfmsum_accum_128

This is because the new code incorrectly assumed that a cast
from "unsigned __int128" to "vector unsigned char" would simply
be a bitcast re-interpretation; instead, this cast actually
truncates the __int128 to char and splats the result.

Fixed by adding an intermediate cast via a single-element
128-bit integer vector.

Fixes: llvm#109113
(cherry picked from commit baf9b7d)
@llvmbot
Copy link
Member

llvmbot commented Oct 7, 2024

/pull-request #111376

@tru tru moved this from Needs Triage to Done in LLVM Release Status Oct 10, 2024
tru pushed a commit to llvmbot/llvm-project that referenced this issue Oct 11, 2024
PR llvm#74625 introduced a regression in the code generated for the
following set of intrinsic:
  vec_add_u128, vec_addc_u128, vec_adde_u128, vec_addec_u128
  vec_sub_u128, vec_subc_u128, vec_sube_u128, vec_subec_u128
  vec_sum_u128, vec_msum_u128
  vec_gfmsum_128, vec_gfmsum_accum_128

This is because the new code incorrectly assumed that a cast
from "unsigned __int128" to "vector unsigned char" would simply
be a bitcast re-interpretation; instead, this cast actually
truncates the __int128 to char and splats the result.

Fixed by adding an intermediate cast via a single-element
128-bit integer vector.

Fixes: llvm#109113
(cherry picked from commit baf9b7d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:SystemZ clang:headers Headers provided by Clang, e.g. for intrinsics miscompilation release:backport
Projects
Development

No branches or pull requests

5 participants