note on EntrywiseMap and vectorization

This is just a comment that @timmoon10 and others may find useful.

I see the following output when compiling Elemental with Intel 18 beta:
```
/home/jrhammon/Work/Elemental/git/include/El/blas_like/level1/EntrywiseMap.hpp(93): warning #15552: loop was not vectorized with "simd"
/home/jrhammon/Work/Elemental/git/include/El/blas_like/level1/EntrywiseMap.hpp(93): warning #15552: loop was not vectorized with "simd"
/home/jrhammon/Work/Elemental/git/include/El/blas_like/level1/EntrywiseMap.hpp(93): warning #15552: loop was not vectorized with "simd"
/home/jrhammon/Work/Elemental/git/include/El/blas_like/level1/EntrywiseMap.hpp(93): warning #15552: loop was not vectorized with "simd"
```

For reference, the relevant code is below, where `EL_SIMD` is using `_Pragma("omp simd")`.
```c++
template<typename S,typename T>
void EntrywiseMap
( const Matrix<S>& A, Matrix<T>& B, function<T(const S&)> func )
{
    EL_DEBUG_CSE
    const Int m = A.Height();
    const Int n = A.Width();
    B.Resize( m, n );
    const S* ABuf = A.LockedBuffer();
    T* BBuf = B.Buffer();
    const Int ALDim = A.LDim();
    const Int BLDim = B.LDim();
    EL_PARALLEL_FOR
    for( Int j=0; j<n; ++j )
    {
        EL_SIMD
        for( Int i=0; i<m; ++i )
        {
            BBuf[i+j*BLDim] = func(ABuf[i+j*ALDim]);
        }
    }
}
```

The problem is that vectorizing `std::function` is hard.  If one wants these to vectorize, one likely has to declare them as SIMD functions (see e.g. https://software.intel.com/en-us/node/524514 for details).

Interestingly enough, the Intel compiler will auto-vectorize lambdas, so if you implement and use `EntrywiseMap` with lambdas instead of `std::function`s, then you are likely to get SIMD code.

Another way to realize threaded+vectorized code in Elemental would be use C++17 Parallel STL, which Intel has implemented in Intel 18 beta (although this is currently somewhat irrelevant due to https://github.com/elemental/Elemental/issues/215 and similar).  `std::for_each( pstl::execution::unseq, ...)` generates SIMD code for lambdas.  Unfortunately, `unseq` isn't standard (yet) but it's trivial to abstract that away.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

note on EntrywiseMap and vectorization #237

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

note on EntrywiseMap and vectorization #237

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions