Skip to content

[JDK-8353240] VectorRearrangeTest failures with 64-bit-element 128-bit vectors #11085

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 25, 2025

Conversation

graalvmbot
Copy link
Collaborator

Hi,

This one is nasty. For some inexplicable reasons, vpermilpd uses the second bit to find which element from the source vector is chosen at each element of the index vector. That is it does dst[i] = src[idx[i] >> 1]. To make matters worse, Intel sdm says that:

The control bits are located at bit 0 of each quadword element (see Figure 5-24). Each control determines which of the source element in an input pair is selected for the destination element. Each pair of source elements must lie in the same 128-bit region as the destination

Which is wrong, the control bits are located at bit 1 of each quadword element.

Come back to the Mach 5 test, it looks like this:

public void rearrange_long128() {
    VectorShuffle<Long> shuffle = VectorShuffle.fromArray(lspec128, indexes[0], 0);
    for (int i = 0; i < LENGTH; i += lspec128.length()) {
        LongVector.fromArray(lspec128, lsrc, i)
                  .rearrange(shuffle)
                  .intoArray(ldst, i);
    }
}

So why it only fails with DeoptimizeALot? It is because Graal does not vectorize the graph if any VectorAPINode is not intrinsifiable. In this case, the normal compilation does not vectorize because VectorShuffle::fromArray is not intrinsifiable. However, when the method deoptimize, because the method is still running, the VM tries to make an OSR compilation starting at the loop head. This vectorizes because the non-intrinfiable VectorShuffle::fromArray is out of the scope of the OSR compilation.

I fix the implementation, it is hard to believe x86 does not have an equivalent of vpermq and vpermpd for 128-bit vectors. We have to shift the index vector left by 1 to perform vpermilpd.

Please take a look and leave your reviews, thanks a lot.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Apr 25, 2025
@graalvmbot graalvmbot closed this Apr 25, 2025
@graalvmbot graalvmbot deleted the qam/vpermilpd branch April 25, 2025 19:54
@graalvmbot graalvmbot merged commit 33123de into master Apr 25, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants