Description
This is just a comment that @timmoon10 and others may find useful.
I see the following output when compiling Elemental with Intel 18 beta:
/home/jrhammon/Work/Elemental/git/include/El/blas_like/level1/EntrywiseMap.hpp(93): warning #15552: loop was not vectorized with "simd"
/home/jrhammon/Work/Elemental/git/include/El/blas_like/level1/EntrywiseMap.hpp(93): warning #15552: loop was not vectorized with "simd"
/home/jrhammon/Work/Elemental/git/include/El/blas_like/level1/EntrywiseMap.hpp(93): warning #15552: loop was not vectorized with "simd"
/home/jrhammon/Work/Elemental/git/include/El/blas_like/level1/EntrywiseMap.hpp(93): warning #15552: loop was not vectorized with "simd"
For reference, the relevant code is below, where EL_SIMD
is using _Pragma("omp simd")
.
template<typename S,typename T>
void EntrywiseMap
( const Matrix<S>& A, Matrix<T>& B, function<T(const S&)> func )
{
EL_DEBUG_CSE
const Int m = A.Height();
const Int n = A.Width();
B.Resize( m, n );
const S* ABuf = A.LockedBuffer();
T* BBuf = B.Buffer();
const Int ALDim = A.LDim();
const Int BLDim = B.LDim();
EL_PARALLEL_FOR
for( Int j=0; j<n; ++j )
{
EL_SIMD
for( Int i=0; i<m; ++i )
{
BBuf[i+j*BLDim] = func(ABuf[i+j*ALDim]);
}
}
}
The problem is that vectorizing std::function
is hard. If one wants these to vectorize, one likely has to declare them as SIMD functions (see e.g. https://software.intel.com/en-us/node/524514 for details).
Interestingly enough, the Intel compiler will auto-vectorize lambdas, so if you implement and use EntrywiseMap
with lambdas instead of std::function
s, then you are likely to get SIMD code.
Another way to realize threaded+vectorized code in Elemental would be use C++17 Parallel STL, which Intel has implemented in Intel 18 beta (although this is currently somewhat irrelevant due to #215 and similar). std::for_each( pstl::execution::unseq, ...)
generates SIMD code for lambdas. Unfortunately, unseq
isn't standard (yet) but it's trivial to abstract that away.