Skip to content

note on EntrywiseMap and vectorization #237

Open
@jeffhammond

Description

@jeffhammond

This is just a comment that @timmoon10 and others may find useful.

I see the following output when compiling Elemental with Intel 18 beta:

/home/jrhammon/Work/Elemental/git/include/El/blas_like/level1/EntrywiseMap.hpp(93): warning #15552: loop was not vectorized with "simd"
/home/jrhammon/Work/Elemental/git/include/El/blas_like/level1/EntrywiseMap.hpp(93): warning #15552: loop was not vectorized with "simd"
/home/jrhammon/Work/Elemental/git/include/El/blas_like/level1/EntrywiseMap.hpp(93): warning #15552: loop was not vectorized with "simd"
/home/jrhammon/Work/Elemental/git/include/El/blas_like/level1/EntrywiseMap.hpp(93): warning #15552: loop was not vectorized with "simd"

For reference, the relevant code is below, where EL_SIMD is using _Pragma("omp simd").

template<typename S,typename T>
void EntrywiseMap
( const Matrix<S>& A, Matrix<T>& B, function<T(const S&)> func )
{
    EL_DEBUG_CSE
    const Int m = A.Height();
    const Int n = A.Width();
    B.Resize( m, n );
    const S* ABuf = A.LockedBuffer();
    T* BBuf = B.Buffer();
    const Int ALDim = A.LDim();
    const Int BLDim = B.LDim();
    EL_PARALLEL_FOR
    for( Int j=0; j<n; ++j )
    {
        EL_SIMD
        for( Int i=0; i<m; ++i )
        {
            BBuf[i+j*BLDim] = func(ABuf[i+j*ALDim]);
        }
    }
}

The problem is that vectorizing std::function is hard. If one wants these to vectorize, one likely has to declare them as SIMD functions (see e.g. https://software.intel.com/en-us/node/524514 for details).

Interestingly enough, the Intel compiler will auto-vectorize lambdas, so if you implement and use EntrywiseMap with lambdas instead of std::functions, then you are likely to get SIMD code.

Another way to realize threaded+vectorized code in Elemental would be use C++17 Parallel STL, which Intel has implemented in Intel 18 beta (although this is currently somewhat irrelevant due to #215 and similar). std::for_each( pstl::execution::unseq, ...) generates SIMD code for lambdas. Unfortunately, unseq isn't standard (yet) but it's trivial to abstract that away.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions