-
Notifications
You must be signed in to change notification settings - Fork 1.8k
OpenZFS 2.0.x and Xeon Phi incompatibility #11937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That's the AVX512BW-only function, which indeed cannot work on KNL.
It seems when the offending function is compiled-in, the benchmark It should have it's own fletcher_4_avx512bw_valid, but it is using @thedarave Can you try what happens with https://github.com/rdolbeau/zfs/tree/fletcher_avx512bw_fix ? |
This gets past the modprobe phase. Any system level tests that you'd recommend performing or like to see performed? |
@thedarave Maybe check the bench numbers in /proc/spl/kstat/zfs/fletcher_4_bench, avx512bw should not be there and avx512[f] should be - and likely be the best (the important thing is, it wasn't disabled). This should be a fairly safe patch, as it just disable the offending function on machines where it won't work. |
As requested. |
Closes openzfs#11937 Signed-off-by: Romain Dolbeau <[email protected]>
@thedarave Thanks! LGTM. And sorry for having introduced the bug in the first place. |
@rdolbeau No worries! I'm sort of an edge case anyway. Going to close this out unless someone else has something they want me to test. |
Introduce a specific valid function for avx512f+avx512bw (instead of checking only for avx512f). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes #11937 Closes #11938
Introduce a specific valid function for avx512f+avx512bw (instead of checking only for avx512f). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes openzfs#11937 Closes openzfs#11938
Introduce a specific valid function for avx512f+avx512bw (instead of checking only for avx512f). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes openzfs#11937 Closes openzfs#11938
Introduce a specific valid function for avx512f+avx512bw (instead of checking only for avx512f). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes openzfs#11937 Closes openzfs#11938
Introduce a specific valid function for avx512f+avx512bw (instead of checking only for avx512f). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes openzfs#11937 Closes openzfs#11938
Introduce a specific valid function for avx512f+avx512bw (instead of checking only for avx512f). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes openzfs#11937 Closes openzfs#11938
Introduce a specific valid function for avx512f+avx512bw (instead of checking only for avx512f). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes openzfs#11937 Closes openzfs#11938
Introduce a specific valid function for avx512f+avx512bw (instead of checking only for avx512f). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes openzfs#11937 Closes openzfs#11938
Introduce a specific valid function for avx512f+avx512bw (instead of checking only for avx512f). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes openzfs#11937 Closes openzfs#11938
Introduce a specific valid function for avx512f+avx512bw (instead of checking only for avx512f). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes openzfs#11937 Closes openzfs#11938
Introduce a specific valid function for avx512f+avx512bw (instead of checking only for avx512f). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes openzfs#11937 Closes openzfs#11938
Introduce a specific valid function for avx512f+avx512bw (instead of checking only for avx512f). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes #11937 Closes #11938
Introduce a specific valid function for avx512f+avx512bw (instead of checking only for avx512f). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes openzfs#11937 Closes openzfs#11938
Introduce a specific valid function for avx512f+avx512bw (instead of checking only for avx512f). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes #11937 Closes #11938
System information
Describe the problem you're observing
I tried moving one of my ASRockRack 2U4N-F/X200/X200D6HM Xeon Phi nodes to OpenZFS 2.0 just for the hell of it this weekend, and found that no matter what I used, it crashed in the same place: insmod zcommon.ko. This included both native Ubuntu builds from 21.04 (2.0.2), and PPA repos for 20.04 (2.0.4). Here's the crash message I see:
If I am reading this right, it looks like it is crashing in fletcher_4_avx512.c trying to utilize the biteswap AVX instruction. This instruction doesn't exist in the AVX512 instruction set on the Xeon Phi. ( https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512 ) As you'll note, it does support the AVX512 Foundation instructions and Conflict Detection instructions, but nothing else "common" beyond that.
Here's the top /proc/cpuinfo ID in case it is useful:
Describe how to reproduce the problem
Execute
while running on a Xeon Phi 72xx or on a system that has foundation instructions only support for AVX512.
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: