Skip to content

OpenCL based ACC-backend and SMM library #406

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 59 commits into from
Feb 2, 2021
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
60e24b3
Completed implementation with passing regtests. Included validation i…
hfp Dec 14, 2020
50a4c0b
Introduced USE_ACCEL replacing USE_CUDA, USE_HIP, and USE_OPENCL. Som…
hfp Dec 14, 2020
7871105
Introduced USE_ACCEL replacing USE_CUDA, USE_HIP, and USE_OPENCL. Att…
hfp Dec 14, 2020
8a38693
Collected acc_opencl_synchronous_memops into global acc_opencl_option…
hfp Dec 16, 2020
9662267
Respect compile-time setting (ACC_OPENCL_SVM).
hfp Dec 16, 2020
58ea8a5
Removed support for ACC_OPENCL_STREAM_OOOEXEC as usage depends on in-…
hfp Dec 18, 2020
11619df
Fixed calling clGetMemObjectInfo accidentally with wrong object. Runt…
hfp Dec 18, 2020
45b4e9d
Attempt to fix linker errors with additional test case (HIP/ROCm).
hfp Dec 21, 2020
d82bf42
Fixed warnings about explicitly deprecated CUDA/HIP functions.
hfp Dec 21, 2020
a2a34b9
Another attempt to fix cross-dependency in CUDA/HIP backend.
hfp Dec 21, 2020
4229569
One more attempt to fix cross-dependencies.
hfp Dec 21, 2020
c00b95f
Disabled dbcsr_acc_test for HIP (linker error due to cross-dependency).
hfp Dec 23, 2020
a242fa2
Revert "Fixed warnings about explicitly deprecated CUDA/HIP functions."
hfp Dec 23, 2020
6bdf07e
Prettify.
hfp Dec 23, 2020
a26540e
Improved creating resource/kernel file. Introduced CONSTANT and relat…
hfp Jan 4, 2021
341e9a5
Renamed CONSTANT to GLOBAL and expand GLOBAL to either "constant" or …
hfp Jan 4, 2021
c18d45f
Removed superfluous barrier.
hfp Jan 4, 2021
39c719c
Allow to disable (pre-)transposing B-matrices (to only run the SMM-ke…
hfp Jan 5, 2021
6418bfd
Prepared for tuned kernel (introduced parameters; WIP)
hfp Jan 6, 2021
7609f58
Implemented blocking SMMs into tiles. Introduced (mini-)batchsize (on…
hfp Jan 7, 2021
e17bb7f
Implemented intra-kernel (mini-)batch accumulation (disabled by defau…
hfp Jan 8, 2021
e37fbde
Fixed SMM-kernel for (mini-)batches (1 < BS). Rely 2d-arrays for clar…
hfp Jan 11, 2021
3a7e5f5
Adjusted and fixed work split. Print additional norm (debug). Fixed c…
hfp Jan 13, 2021
fb8cdcd
Removed barrier (mini-batch).
hfp Jan 13, 2021
649f7e2
Fixed array initializer.
hfp Jan 13, 2021
eb419dd
Reintroduced barrier.
hfp Jan 13, 2021
9ce4721
Removed dead code (as suggested).
hfp Jan 13, 2021
7650d1f
Initial auto-tuning script (based in OpenTuner; documentation and req…
hfp Jan 15, 2021
c048f71
Adjusted filename of finally written result.
hfp Jan 15, 2021
56040b6
Prettified Python script.
hfp Jan 15, 2021
55aef80
Fixed file header/banner.
hfp Jan 15, 2021
d9b92ea
Improved performance of SMM-kernel; adjusted tune_multiply.py accordi…
hfp Jan 18, 2021
09910fc
Adjusted filename (max.gflops found), and added newline (final result…
hfp Jan 18, 2021
2856eac
Extend result/file for easier reuse (JSON), and merge JSONs into CSV …
hfp Jan 19, 2021
9808af5
Implemented loading tuned parameters embedded into binary or from file.
hfp Jan 20, 2021
db12552
Fixed issues pointed out by Shellcheck.
hfp Jan 20, 2021
4bb3983
Fixed/worked around initialize/finalize issue.
hfp Jan 21, 2021
e6693bd
Correct initialization/finalization flow (benchmark drivers); includi…
hfp Jan 21, 2021
11ae7d5
Missed workaround for CUDA (#422).
hfp Jan 21, 2021
7d46d54
Added requirements (OpenTuner). Added wrapper script to tune multiple…
hfp Jan 21, 2021
83c0532
Improved console output.
hfp Jan 21, 2021
5b8f8cc
Updated various documentation pieces (WIP).
hfp Jan 21, 2021
b355f03
Allow empty/no choice with respect to USE_ACCEL.
hfp Jan 21, 2021
f7f2f2a
Attempt to CI-test OpenCL backend and LIBSMM.
hfp Jan 21, 2021
630432d
Adjusted CI/build setup: build LIBXSMM and help CMake to find OpenCL.
hfp Jan 21, 2021
24a8a79
Extend PKG_CONFIG_PATH rather than overriding it.
hfp Jan 22, 2021
432ff63
Further adjusted build/run scripts (Daint-CI).
hfp Jan 22, 2021
dba923d
One more attempt to get CI up and running.
hfp Jan 22, 2021
4c45d7b
Disabled Daint-CI runtime tests (temporarily). Prepared revised trans…
hfp Jan 22, 2021
911e8da
Improved finding OpenCL bits (e.g., on Daint).
hfp Jan 22, 2021
61c12bb
Fixed nasty typo. Adjusted default GPU to P100 (to better adhere to D…
hfp Jan 22, 2021
e7a141c
Improved build messages/help.
hfp Jan 26, 2021
a2506c6
Adjusted installation instructions for clarity.
hfp Jan 26, 2021
cadbcae
Adjusted existing documentation to better accommodate/distinct the Op…
hfp Jan 26, 2021
a0c1dcf
Documented auto-tuning.
hfp Jan 27, 2021
e0c4b07
Improved console output (tune_multiply.sh).
hfp Jan 28, 2021
417cfb8
Note about opentuner.db directory. Some additional details and rephrase.
hfp Jan 28, 2021
1b5fd3b
Adjusted separator (tune_multiply.sh).
hfp Jan 28, 2021
711d289
Improved documentation with some sample output (auto-tuning).
hfp Jan 28, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .ci/daint.cscs.ch/cray.build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ cd "${SCRATCH}/${BUILD_TAG}.cray"

cmake \
-DCMAKE_SYSTEM_NAME=CrayLinuxEnvironment \
-DUSE_CUDA=ON \
-DUSE_ACCEL=cuda \
-DWITH_GPU=P100 \
-DBLAS_FOUND=ON -DBLAS_LIBRARIES="-lsci_cray_mpi_mp" \
-DLAPACK_FOUND=ON -DLAPACK_LIBRARIES="-lsci_cray_mpi_mp" \
Expand Down
2 changes: 1 addition & 1 deletion .ci/daint.cscs.ch/gnu.build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ cd "${SCRATCH}/${BUILD_TAG}.gnu"
cmake \
-DCMAKE_SYSTEM_NAME=CrayLinuxEnvironment \
-DCMAKE_CROSSCOMPILING_EMULATOR="" \
-DUSE_CUDA=ON \
-DUSE_ACCEL=cuda \
-DWITH_GPU=P100 \
-DBLAS_FOUND=ON -DBLAS_LIBRARIES="-lsci_gnu_mpi_mp" \
-DLAPACK_FOUND=ON -DLAPACK_LIBRARIES="-lsci_gnu_mpi_mp" \
Expand Down
2 changes: 1 addition & 1 deletion .ci/daint.cscs.ch/intel.build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ cd "${SCRATCH}/${BUILD_TAG}.intel"

cmake \
-DCMAKE_SYSTEM_NAME=CrayLinuxEnvironment \
-DUSE_CUDA=ON \
-DUSE_ACCEL=cuda \
-DWITH_GPU=P100 \
-DBLAS_FOUND=ON -DBLAS_LIBRARIES="-lsci_intel_mpi_mp" \
-DLAPACK_FOUND=ON -DLAPACK_LIBRARIES="-lsci_intel_mpi_mp" \
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/testing-linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ jobs:
cmake -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DUSE_${{ matrix.use_openmp }} \
-DUSE_HIP=ON \
-DUSE_ACCEL=hip \
-DWITH_GPU=Mi50 \
..
- name: Build
Expand Down
62 changes: 38 additions & 24 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -90,15 +90,10 @@ set(USE_SMM
"Small Matrix Multiplication implementation to use (default: blas)")
set_property(CACHE USE_SMM PROPERTY STRINGS blas libxsmm)

option(USE_CUDA "Build with CUDA support" OFF)
option(USE_HIP "Build with HIP support" OFF)
# USE_CUDA and USE_HIP are mutually exclusive options: we either compile with
# nvcc OR with hipcc
if (USE_CUDA AND USE_HIP)
message(
FATAL_ERROR
"USE_CUDA and USE_HIP options are mutually exclusive. Please choose one.")
endif ()
set(USE_ACCEL
""
CACHE STRING "Build with acceleration support (default: none)")
set_property(CACHE USE_ACCEL PROPERTY STRINGS opencl cuda hip)

set(SUPPORTED_CUDA_ARCHITECTURES K20X K40 K80 P100 V100)
set(SUPPORTED_HIP_ARCHITECTURES Mi50)
Expand All @@ -117,21 +112,27 @@ enable_language(Fortran)

if (WITH_C_API AND WITH_EXAMPLES)
enable_language(CXX)
enable_language(C)
endif ()

# we're always using at least C++11
# always use at least C++11
set(CMAKE_CXX_STANDARD 11)

# =================================================================================================
# PACKAGE DISCOVERY (compiler configuration can impact package discovery)
find_package(PkgConfig)

# =================================== OpenMP and OpenMP/offload backend
# =================================== OpenMP
if (USE_OPENMP)
find_package(OpenMP REQUIRED)
endif ()

# =================================== LIBXSMM (rely on pkg-config)
if ((USE_SMM MATCHES "libxsmm") OR (USE_ACCEL MATCHES "opencl"))
pkg_check_modules(LIBXSMM IMPORTED_TARGET GLOBAL libxsmmf)
endif ()

# =================================== BLAS & LAPACK, PkgConfig
find_package(PkgConfig)
find_package(LAPACK REQUIRED) # needed for some of the integrated test routines,
# also calls find_package(BLAS)

Expand All @@ -141,8 +142,7 @@ find_package(LAPACK REQUIRED) # needed for some of the integrated test routines,
# environment for a python interpreter before searching elsewhere in the system.
# In CMake <3.15, the system is searched before the virtual environment.
if (NOT Python_EXECUTABLE)
# If the python interpreter isn't specified as a command line option, look for
# it:
# If the python interpreter is not specified (command line), try finding it:
find_package(
Python
COMPONENTS Interpreter
Expand Down Expand Up @@ -185,15 +185,32 @@ endif ()
if (USE_SMM MATCHES "blas")
message("-- Using BLAS for Small Matrix Multiplication")
elseif (USE_SMM MATCHES "libxsmm")
# rely on pkg-config in order to link against libxsmm
pkg_check_modules(deps REQUIRED IMPORTED_TARGET GLOBAL libxsmmf)
message("-- Using libxsmm for Small Matrix Multiplication")
if (LIBXSMM_FOUND)
message("-- Using libxsmm for Small Matrix Multiplication")
else ()
message(
FATAL_ERROR
"LIBXSMM is not found but requested (USE_SMM). "
"Please install libxsmm and set PKG_CONFIG_PATH=/path/to/libxsmm/lib")
endif ()
else ()
message(FATAL_ERROR "Unknown SMM library specified")
endif ()

# =================================== GPU backend
if (USE_CUDA OR USE_HIP)
# =================================== GPU backends
if (USE_ACCEL MATCHES "opencl")
if (NOT LIBXSMM_FOUND)
message(
FATAL_ERROR
"LIBXSMM is not found but it is required for the ACC/OpenCL backend. "
"Please install libxsmm and set PKG_CONFIG_PATH=/path/to/libxsmm/lib")
endif ()

find_package(OpenCL REQUIRED)
enable_language(C)
endif ()

if (USE_ACCEL MATCHES "cuda|hip")
enable_language(CXX)
set(GPU_ARCH_NUMBER_K20X 35)
set(GPU_ARCH_NUMBER_K40 35)
Expand All @@ -203,8 +220,7 @@ if (USE_CUDA OR USE_HIP)
set(GPU_ARCH_NUMBER_Mi50 gfx906)
endif ()

if (USE_CUDA)

if (USE_ACCEL MATCHES "cuda")
enable_language(CUDA)
if (CMAKE_CUDA_COMPILER_VERSION LESS 5.5)
message(FATAL_ERROR "CUDA version >= 5.5 is required.")
Expand Down Expand Up @@ -243,7 +259,6 @@ if (USE_CUDA)
else ()
message(STATUS "Found cuBLAS: ${CUBLAS}")
endif ()

if (WITH_CUDA_PROFILING)
find_library(
CUDA_NVTOOLSEXT nvToolsExt
Expand All @@ -257,8 +272,7 @@ endif ()

# inspired from
# https://github.com/ROCm-Developer-Tools/HIP/tree/master/samples/2_Cookbook/12_cmake_hip_add_executable
if (USE_HIP)

if (USE_ACCEL MATCHES "hip")
# Make sure the GPU required is supported
list(FIND SUPPORTED_HIP_ARCHITECTURES ${WITH_GPU} GPU_SUPPORTED)
if (GPU_SUPPORTED EQUAL -1)
Expand Down
5 changes: 5 additions & 0 deletions cmake/CompilerConfiguration.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -88,3 +88,8 @@ Please open an issue at https://github.com/cp2k/dbcsr/issues with the reported c
message("-- CMAKE_CXX_COMPILER_ID: " ${CMAKE_CXX_COMPILER_ID})
message("-- CMAKE_CXX_COMPILER full path: " ${CMAKE_CXX_COMPILER})
endif ()

# inherit C flags from CXX
set(CMAKE_C_FLAGS_RELEASE ${CMAKE_CXX_FLAGS_RELEASE})
set(CMAKE_C_FLAGS_COVERAGE ${CMAKE_CXX_FLAGS_COVERAGE})
set(CMAKE_C_FLAGS_DEBUG ${CMAKE_CXX_FLAGS_DEBUG})
3 changes: 1 addition & 2 deletions docs/guide/2-user-guide/1-installation/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,8 @@ make
-DUSE_MPI=<ON|OFF>
-DUSE_OPENMP=<ON|OFF>
-DUSE_SMM=<blas|libxsmm>
-DUSE_CUDA=<OFF|ON>
-DUSE_ACCEL=<opencl|cuda|hip>
-DWITH_CUDA_PROFILING=<OFF|ON>
-DUSE_HIP=<OFF|ON>
-DWITH_C_API=<ON|OFF>
-DWITH_EXAMPLES=<ON|OFF>
-DWITH_GPU=<P100|K20X|K40|K80|V100|Mi50>
Expand Down
4 changes: 2 additions & 2 deletions examples/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ set(DBCSR_PROGRAM_SRCS_CPP dbcsr_example_3.cpp dbcsr_tensor_example_2.cpp)
# Compile Fortran examples
foreach (dbcsr_program_src ${DBCSR_PROGRAM_SRCS_FTN})
get_filename_component(dbcsr_program_name ${dbcsr_program_src} NAME_WE)
if (USE_HIP)
if (USE_ACCEL MATCHES "hip")
hip_add_executable(${dbcsr_program_name} ${dbcsr_program_src})
else ()
add_executable(${dbcsr_program_name} ${dbcsr_program_src})
Expand All @@ -24,7 +24,7 @@ if (WITH_C_API)
foreach (dbcsr_program_src ${DBCSR_PROGRAM_SRCS_CPP})
get_filename_component(dbcsr_program_name ${dbcsr_program_src} NAME_WE)
set(dbcsr_program_name ${dbcsr_program_name}_cpp)
if (USE_HIP)
if (USE_ACCEL MATCHES "hip")
hip_add_executable(${dbcsr_program_name} ${dbcsr_program_src})
else ()
add_executable(${dbcsr_program_name} ${dbcsr_program_src})
Expand Down
39 changes: 28 additions & 11 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,10 @@ add_fypp_sources(
utils/dbcsr_toollib.F
work/dbcsr_work_operations.F)

set(DBCSR_OPENCL_SRCS
acc/opencl/acc_opencl.c acc/opencl/acc_opencl_event.c
acc/opencl/acc_opencl_mem.c acc/opencl/acc_opencl_stream.c)

set(DBCSR_CUDA_SRCS
acc/cuda/acc_cublas.cu
acc/cuda/acc_cuda.cpp
Expand Down Expand Up @@ -141,13 +145,9 @@ add_library(dbcsr ${DBCSR_SRCS})
set_target_properties(dbcsr PROPERTIES VERSION ${dbcsr_VERSION}
SOVERSION ${dbcsr_APIVERSION})

if (TARGET PkgConfig::deps)
target_link_libraries(dbcsr PRIVATE PkgConfig::deps)
endif ()

if (USE_SMM MATCHES "libxsmm")
# linker/include flags are managed by pkg-config (above)
if (LIBXSMM_FOUND)
target_compile_definitions(dbcsr PRIVATE __LIBXSMM)
target_link_libraries(dbcsr PRIVATE PkgConfig::LIBXSMM)
endif ()

if (BLAS_LIBRARIES MATCHES "mkl_")
Expand Down Expand Up @@ -203,6 +203,25 @@ if (OpenMP_FOUND)
target_link_libraries(dbcsr PRIVATE OpenMP::OpenMP_Fortran)
endif ()

# =================================================================================================
# DBCSR LIBRARY's OPENCL BACKEND

if (USE_ACCEL MATCHES "opencl")
target_compile_definitions(dbcsr PRIVATE __DBCSR_ACC)
target_link_libraries(dbcsr PRIVATE ${OpenCL_LIBRARY})

# OpenCL backend
set(DBCSR_ACC_SRCS ${DBCSR_OPENCL_SRCS})
add_library(acc OBJECT ${DBCSR_ACC_SRCS})
target_compile_definitions(acc PRIVATE __OPENCL)
# account for DBCSR not calling libsmm_acc_init() (DBCSR only calls acc_init)
target_compile_definitions(acc PRIVATE __DBCSR_ACC)
target_include_directories(acc PRIVATE ${OpenCL_INCLUDE_DIRS})
target_sources(dbcsr PRIVATE $<TARGET_OBJECTS:acc>)
add_subdirectory(acc/opencl/smm)
target_sources(dbcsr PRIVATE $<TARGET_OBJECTS:libsmm_acc>)
endif ()

# =================================================================================================
# DBCSR LIBRARY's CUDA BACKEND

Expand Down Expand Up @@ -240,7 +259,7 @@ function (CUDA_CONVERT_FLAGS EXISTING_TARGET)
)
endfunction ()

if (USE_CUDA)
if (USE_ACCEL MATCHES "cuda")
if (${CMAKE_VERSION} VERSION_LESS 3.16)
# workaround for CUDA support with CMake <3.16, see also see
# https://gitlab.kitware.com/cmake/cmake/issues/17929 and
Expand Down Expand Up @@ -296,8 +315,7 @@ endif ()
# =================================================================================================
# DBCSR LIBRARY's HIP BACKEND

if (USE_HIP)

if (USE_ACCEL MATCHES "hip")
if (USE_OPENMP)
set(HIP_HIPCC_FLAGS "${HIP_HIPCC_FLAGS} ${OpenMP_CXX_FLAGS}")
endif ()
Expand Down Expand Up @@ -335,7 +353,6 @@ if (USE_HIP)

target_compile_definitions(dbcsr PRIVATE __DBCSR_ACC)
target_compile_definitions(dbcsr PRIVATE __HIP)

endif ()

# =================================================================================================
Expand Down Expand Up @@ -401,7 +418,7 @@ write_basic_package_version_file(
"${CMAKE_CURRENT_BINARY_DIR}/DBCSRConfigVersion.cmake"
VERSION "${dbcsr_VERSION}"
COMPATIBILITY SameMajorVersion)
if (USE_HIP)
if (USE_ACCEL MATCHES "hip")
install(
EXPORT libsmm_accTargets
NAMESPACE "${config_namespace}"
Expand Down
2 changes: 1 addition & 1 deletion src/acc/PACKAGE
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"description": "Generic accelerator API",
"archive": "libdbcsr",
"requires": ["../base", "cuda", "hip", "libsmm_acc"]
"requires": ["../base", "cuda", "hip", "opencl", "libsmm_acc"]
}
Loading