[SystemZ] Bad Codegen for vec_gfmsum_accum_128 #109113

saitama951 · 2024-09-18T09:38:44Z

Recently the dotnet team started seeing Compression test case failures. These tests fail when zlib-ng is compiled with clang.
on deeper inspection I have managed to extract a test program from zlib-ng (Link)

#include "stdio.h"
#include <vecintrin.h>
typedef unsigned char uv16qi __attribute__((vector_size(16)));
typedef unsigned int uv4si __attribute__((vector_size(16)));
typedef unsigned long long uv2di __attribute__((vector_size(16)));

int main()
{
        const uv2di r2r1 = {0x1C6E41596, 0x154442BD4};
        uv2di v1 = {7381244131595332141, 2315514454429938015};
        uv16qi part1 = {97, 116, 115, 32, 109, 111, 114, 102, 32, 115, 101, 110, 105, 108, 32, 100};
        uv2di result = (uv2di)vec_gfmsum_accum_128(r2r1, v1, part1);
        printf("value 1: %llu\n value 2: %llu\n", result[0],result[1]);
        return 0;
}

The results are as follows with gcc and clang:

[sanjam@s390x ~]$ gcc -g bug-clang.c -march=z15 -mzvector -o b.out
[sanjam@s390x ~]$ ./b.out 
value 1: 7022364300429628393
 value 2: 4831923049869144086
 
 
[sanjam@s390x ~]$ clang -g bug-clang.c -march=z15 -fzvector
[sanjam@s390x ~]$ ./a.out 
value 1: 1591483802437686806
 value 2: 1591483802437686806

now on inspecting the disassembly I see this:

static inline __ATTRS_o_ai __vector unsigned char
vec_gfmsum_accum_128(__vector unsigned long long __a,
                     __vector unsigned long long __b,
                     __vector unsigned char __c) {
  return (__vector unsigned char)
         __builtin_s390_vgfmag(__a, __b, (unsigned __int128)__c);
    1258:       e7 00 b1 08 30 06       vl      %v0,264(%r11),3
    125e:       e7 10 b0 f8 30 06       vl      %v1,248(%r11),3
    1264:       e7 20 b0 e8 30 06       vl      %v2,232(%r11),3
    126a:       e7 00 13 00 20 bc       vgfmag  %v0,%v0,%v1,%v2
    1270:       e7 00 00 03 20 21       vlgvf   %r0,%v0,3
  return (__vector unsigned char)
    1276:       e7 00 00 00 00 62       vlvgp   %v0,%r0,%r0
    127c:       e7 00 00 07 00 4d       vrepb   %v0,%v0,7
    1282:       e7 00 b0 a0 30 0e       vst     %v0,160(%r11),3

here the sequence

    1270:       e7 00 00 03 20 21       vlgvf   %r0,%v0,3
    1276:       e7 00 00 00 00 62       vlvgp   %v0,%r0,%r0
    127c:       e7 00 00 07 00 4d       vrepb   %v0,%v0,7

looks strange, I believe this should be directly doing a vst after the vgfmag?

clang version:

clang version 20.0.0git (https://github.com/llvm/llvm-project a26ec542371652e1d774696e90016fd5b0b1c191)
Target: s390x-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/sanjam/llvm-project/build/bin

cc: @uweigand @JonPsson

The text was updated successfully, but these errors were encountered:

uweigand · 2024-09-18T15:55:50Z

Good catch, this does indeed look wrong. It appears I introduced a regression here: #74625 due to a misinterpretation of the type casting rules:

static inline __ATTRS_o_ai __vector unsigned char
vec_gfmsum_accum_128(__vector unsigned long long __a,
                     __vector unsigned long long __b,
                     __vector unsigned char __c) {
  return (__vector unsigned char)
         __builtin_s390_vgfmag(__a, __b, (unsigned __int128)__c);
}

I assumed that casts between a vector and a scalar of the same size (i.e. any Z vector type and __int128) would be just a bitcast. However, this is only true for the vector -> int128 conversion. A int128 -> vector conversion actually triggers the AltiVec splat rule, which is not what we want here.

I'll come up with a fix.

PR llvm#74625 introduced a regression in the code generated for the following set of intrinsic: vec_add_u128, vec_addc_u128, vec_adde_u128, vec_addec_u128 vec_sub_u128, vec_subc_u128, vec_sube_u128, vec_subec_u128 vec_sum_u128, vec_msum_u128 vec_gfmsum_128, vec_gfmsum_accum_128 This is because the new code incorrectly assumed that a cast from "unsigned __int128" to "vector unsigned char" would simply be a bitcast re-interpretation; instead, this cast actually truncates the __int128 to char and splats the result. Fixed by adding an intermediate cast via a single-element 128-bit integer vector. Fixes: llvm#109113

johnplatts · 2024-10-02T21:31:49Z

I had raised a similar issue (involving the same bug in the Clang vecintrin.h header) in issue #102911.

uweigand · 2024-10-07T13:10:47Z

I had raised a similar issue (involving the same bug in the Clang vecintrin.h header) in issue #102911.

Ah, sorry, I didn't see that. Do you still need a backport to LLVM 19? Note that LLVM 18 unfortunately also has the same bug.

uweigand · 2024-10-07T13:12:49Z

/cherry-pick baf9b7d

llvmbot · 2024-10-07T13:17:33Z

/cherry-pick baf9b7d

Error: Command failed due to missing milestone.

uweigand · 2024-10-07T13:18:47Z

/cherry-pick baf9b7d

PR llvm#74625 introduced a regression in the code generated for the following set of intrinsic: vec_add_u128, vec_addc_u128, vec_adde_u128, vec_addec_u128 vec_sub_u128, vec_subc_u128, vec_sube_u128, vec_subec_u128 vec_sum_u128, vec_msum_u128 vec_gfmsum_128, vec_gfmsum_accum_128 This is because the new code incorrectly assumed that a cast from "unsigned __int128" to "vector unsigned char" would simply be a bitcast re-interpretation; instead, this cast actually truncates the __int128 to char and splats the result. Fixed by adding an intermediate cast via a single-element 128-bit integer vector. Fixes: llvm#109113 (cherry picked from commit baf9b7d)

llvmbot · 2024-10-07T13:23:52Z

/pull-request #111376

PR llvm#74625 introduced a regression in the code generated for the following set of intrinsic: vec_add_u128, vec_addc_u128, vec_adde_u128, vec_addec_u128 vec_sub_u128, vec_subc_u128, vec_sube_u128, vec_subec_u128 vec_sum_u128, vec_msum_u128 vec_gfmsum_128, vec_gfmsum_accum_128 This is because the new code incorrectly assumed that a cast from "unsigned __int128" to "vector unsigned char" would simply be a bitcast re-interpretation; instead, this cast actually truncates the __int128 to char and splats the result. Fixed by adding an intermediate cast via a single-element 128-bit integer vector. Fixes: llvm#109113 (cherry picked from commit baf9b7d)

github-actions bot added the new issue label Sep 18, 2024

EugeneZelenko added backend:SystemZ miscompilation and removed new issue labels Sep 18, 2024

uweigand self-assigned this Sep 18, 2024

uweigand closed this as completed in baf9b7d Sep 19, 2024

EugeneZelenko added the clang:headers Headers provided by Clang, e.g. for intrinsics label Sep 19, 2024

uweigand added this to the LLVM 19.X Release milestone Oct 7, 2024

github-project-automation bot added this to LLVM Release Status Oct 7, 2024

github-project-automation bot moved this to Needs Triage in LLVM Release Status Oct 7, 2024

EugeneZelenko added the release:backport label Oct 7, 2024

tru moved this from Needs Triage to Done in LLVM Release Status Oct 10, 2024

fneddy mentioned this issue Jan 8, 2025

s390x Clang MSAN CI run fails CRC32-VX tests zlib-ng/zlib-ng#1845

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SystemZ] Bad Codegen for vec_gfmsum_accum_128 #109113

[SystemZ] Bad Codegen for vec_gfmsum_accum_128 #109113

saitama951 commented Sep 18, 2024 •

edited

Loading

uweigand commented Sep 18, 2024

johnplatts commented Oct 2, 2024

uweigand commented Oct 7, 2024

uweigand commented Oct 7, 2024

llvmbot commented Oct 7, 2024

uweigand commented Oct 7, 2024

llvmbot commented Oct 7, 2024

[SystemZ] Bad Codegen for vec_gfmsum_accum_128 #109113

[SystemZ] Bad Codegen for vec_gfmsum_accum_128 #109113

Comments

saitama951 commented Sep 18, 2024 • edited Loading

uweigand commented Sep 18, 2024

johnplatts commented Oct 2, 2024

uweigand commented Oct 7, 2024

uweigand commented Oct 7, 2024

llvmbot commented Oct 7, 2024

uweigand commented Oct 7, 2024

llvmbot commented Oct 7, 2024

saitama951 commented Sep 18, 2024 •

edited

Loading