Open
Description
Compile existing x86 SSE/AVX SIMD code into WASM SIMD is very attractive, developer can reuse existing library without rewrite it.
However currently only 128-bit subset of the AVX intrinsics are supported, many existing code cannot meet this restriction.
Adding 256-bit AVX intrinsics support will expand the applicable scenarios and may also increase performance.
Does emscripten have a plan for this?
Currently Google Highway supports WASM_EMU256 (a 2x unrolled version of wasm128) target,
A re-vectorize optimization phase is being developed in Google V8 JS engine, which can pack two SIMD128 nodes into one SIMD256 node.
Sample code for AVX intrinsics support:
typedef struct Vec256 {
__m128 v0;
__m128 v1;
}__m256;
static __inline__ __m256 __attribute__((__always_inline__, __nodebug__))
_mm256_add_ps(__m256 __a, __m256 __b) {
__m256 c;
c.v0 = (__m128)wasm_f32x4_add((v128_t)__a.v0, (v128_t)__b.v0);
c.v1 = (__m128)wasm_f32x4_add((v128_t)__a.v1, (v128_t)__b.v1);
return c;
}