You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Feature: enable multi-k calculation for CUDA version of module_gint (deepmodeling#4839)
* add find_matrix_offset function to Class hcontainer
* modify dm matrix in gint_rho_gpu.cu
* modify dm matrix in gint_force_gpu.cu
* remove GPU restriction in multi-k
* modify some functions name
* modify hRGint in gint_vl_gpu.cu
* enable mult-k calculation in gint_vl_gpu
* add const
* modify related doc
* add two testing cases for multi-k calculation
* fix an error
* replace a test case
* remove parameter “gamma_only" in get_device_flag
* modify cuda.md
* modify cuda.md again
* Update docs/advanced/acceleration/cuda.md
Co-authored-by: Chun Cai <[email protected]>
* Update docs/advanced/acceleration/cuda.md
Co-authored-by: Chun Cai <[email protected]>
* Update docs/advanced/input_files/input-main.md
Co-authored-by: Chun Cai <[email protected]>
* Update cuda.md
* Update cuda.md
---------
Co-authored-by: Chun Cai <[email protected]>
Copy file name to clipboardExpand all lines: docs/advanced/acceleration/cuda.md
+5-8
Original file line number
Diff line number
Diff line change
@@ -6,13 +6,13 @@ In ABACUS, we provide the option to use GPU devices to accelerate performance. T
6
6
7
7
-**Electronic state data**: (e.g. electronic density) are moved from the GPU to the CPU(s) every scf step.
8
8
9
-
-**Acclerated by the NVIDIA libraries**: `cuBLAS` for common linear algebra calculations, `cuSolver` for eigen values/vectors, and `cuFFT` for the conversions between the real and recip spaces.
9
+
-**Accelerated by the NVIDIA libraries**: `cuBLAS` for common linear algebra calculations, `cuSolver` for eigen values/vectors, and `cuFFT` for the conversions between the real and recip spaces.
10
10
11
11
-**Multi GPU supprted**: Using multiple MPI tasks will often give the best performance. Note each MPI task will be bind to a GPU device with automatically computing load balancing.
12
12
13
13
-**Parallel strategy**: K point parallel.
14
14
15
-
Unlike PW basis, only the grid integration module (module_gint) and the diagonalization of the Hamiltonian matrix (module_hsolver) have been implemented with GPU acceleration under LCAO basis, and the acceleration is limited to gamma only calculation. Additionally, LCAO basis does not support multi-GPU acceleration. Both the grid integration module and the Hamiltonian matrix solver only support acceleration on a single GPU.
15
+
Unlike PW basis, only the grid integration module (module_gint) and the diagonalization of the Hamiltonian matrix (module_hsolver) have been implemented with GPU acceleration under LCAO basis.
16
16
17
17
## Required hardware/software
18
18
@@ -31,17 +31,14 @@ Check the [Advanced Installation Options](https://abacus-rtd.readthedocs.io/en/l
31
31
32
32
## Run with the GPU support by editing the INPUT script:
33
33
34
-
In `INPUT` file we need to set the value keyword [device](../input_files/input-main.md#device) to be `gpu`.
34
+
In `INPUT` file we need to set the input parameter [device](../input_files/input-main.md#device) to `gpu`. If this parameter is not set, ABACUS will try to determine if there are available GPUs.
35
+
- Set `ks_solver`: For the PW basis, CG, BPCG and Davidson methods are supported on GPU; set the input parameter [ks_solver](../input_files/input-main.md#ks_solver) to `cg`, `bpcg` or `dav`. For the LCAO basis, `cusolver` is supported on GPU.
36
+
-**multi-card**: ABACUS allows for multi-GPU acceleration. If you have multiple GPU cards, you can run ABACUS with several MPI processes, and each process will utilize one GPU card. For example, the command `mpirun -n 2 abacus` will by default launch two GPUs for computation. If you only have one card, this command will only start one GPU.
35
37
36
38
## Examples
37
39
We provides [examples](https://github.com/deepmodeling/abacus-develop/tree/develop/examples/gpu) of gpu calculations.
38
40
39
41
## Known limitations
40
42
PW basis:
41
-
- CG, BPCG and Davidson methods are supported, so the input keyword `ks_solver` can take the values `cg`, `bpcg` or `dav`.
42
43
- Only k point parallelization is supported, so the input keyword `kpar` will be set to match the number of MPI tasks automatically.
43
44
- By default, CUDA architectures 60, 70, 75, 80, 86, and 89 are compiled (if supported). It can be overriden using the CMake variable [`CMAKE_CUDA_ARCHITECTURES`](https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_ARCHITECTURES.html) or the environmental variable [`CUDAARCHS`](https://cmake.org/cmake/help/latest/envvar/CUDAARCHS.html).
44
-
45
-
LCAO basis:
46
-
- Does not support multi-k calculation, so if the input keyword `device` is set to `gpu`, the input keyword `gamma_only` can only take the value `1`.
Copy file name to clipboardExpand all lines: docs/advanced/input_files/input-main.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -629,7 +629,7 @@ If only one value is set (such as `kspacing 0.5`), then kspacing values of a/b/c
629
629
- cpu: for CPUs via Intel, AMD, or Other supported CPU devices
630
630
- gpu: for GPUs via CUDA or ROCm.
631
631
632
-
Known limitations: If using the pw basis, the ks_solver must be cg/bpcg/dav to support `gpu` acceleration. If using the lcao basis, `gamma_only`must be set to `1`, as multi-k calculation is currently not supported for `gpu`. lcao_in_pw also does not support `gpu`.
632
+
Known limitations: `ks_solver`must also be set to the algorithms supported. lcao_in_pw currently does not support `gpu`.
0 commit comments