You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/guide/3-developer-guide/3-programming/2-accelerator-backend/2-libsmm_acc/2-parameters.md
-6
Original file line number
Diff line number
Diff line change
@@ -14,9 +14,3 @@ The batched matrix-matrix multiplication kernels are templated on:
14
14
The batched transpose kernels are templated on:
15
15
16
16
* the characteristic dimensions of the transpose: `m, n`
17
-
18
-
## Predictive parameters
19
-
20
-
The input features for the predictive models can be 'raw' parameters (left-most-column in the figure below), or hand-engineered features 'derived' from the raw features (matrix sizes, launch parameters and resource usage estimations).
Copy file name to clipboardExpand all lines: src/acc/libsmm_acc/README.md
+5-13
Original file line number
Diff line number
Diff line change
@@ -12,12 +12,10 @@ For a description of the library (some details are outdated, but this neverthele
12
12
13
13
## Directory Organization
14
14
15
-
-[`kernels/`](https://github.com/cp2k/dbcsr/blob/develop/src/acc/libsmm_acc/kernels/): GPU kernels (CUDA- and HIP-compatible) for matrix-matrix multiplication and python interface to autotuning and predictive code.
16
-
-[`notebooks/`](https://github.com/cp2k/dbcsr/blob/develop/src/acc/libsmm_acc/notebooks/): jupyter notebooks for exploring data generated from autotuning and prediction.
15
+
-[`kernels/`](https://github.com/cp2k/dbcsr/blob/develop/src/acc/libsmm_acc/kernels/): GPU kernels (CUDA- and HIP-compatible) for matrix-matrix multiplication and Python interface to autotuning code.
17
16
-`generate_*.py`: utility scripts for `libsmm_acc` compilation
18
17
-`libsmm_acc*`: libsmm_acc C++ and CUDA / HIP code
19
-
-[`parameters/`](https://github.com/cp2k/dbcsr/blob/develop/src/acc/libsmm_acc/parameters/): contains `parameters_GPU.json` files. These are sets of matrix-matrix multiplication parameters for different (m, n, k)-triplets optimized for a given GPU card. You can explore these parameters interactively using the [provided jupyter notebook](https://github.com/cp2k/dbcsr/blob/develop/src/acc/libsmm_acc/notebooks/inspect_autotuned_parameters.ipynb)
20
-
-[`predict/`](https://github.com/cp2k/dbcsr/blob/develop/src/acc/libsmm_acc/predict/): scripts for prediction of optimal parameter sets, see [predictive modeling of kernel parameters](https://github.com/cp2k/dbcsr/blob/develop/src/acc/libsmm_acc/predict/README.md)
18
+
-[`parameters/`](https://github.com/cp2k/dbcsr/blob/develop/src/acc/libsmm_acc/parameters/): contains `parameters_GPU.json` files. These are sets of matrix-matrix multiplication parameters for different (m, n, k)-triplets optimized for a given GPU card.
21
19
-[`tune/`](https://github.com/cp2k/dbcsr/blob/develop/src/acc/libsmm_acc/tune/): scripts for autotuning of optimal parameter sets, see [autotuning of kernel parameters](https://github.com/cp2k/dbcsr/blob/develop/src/acc/libsmm_acc/tune/README.md)
22
20
23
21
## Matrix-matrix Multiplication Kernels and Parameters
@@ -46,7 +44,7 @@ which take between 3 - 7 **parameters** (see figure at the top):
46
44
-**w**: input slab width (width of slab `P_A` and `P_B`)
47
45
-**v**: output slab width (width of slab `P_C`)
48
46
49
-
The performance of the matrix-matrix multiplication kernels is highly dependent on the choice of algorithm and parameters. For this reason, `libsmm_acc` provides lists of optimal parameters for different GPU cards and different (m, n, k)-triplets. These sets of optimal parameters can be found either through *autotuning* or *predictive modeling*.
47
+
The performance of the matrix-matrix multiplication kernels is highly dependent on the choice of algorithm and parameters. For this reason, `libsmm_acc` provides lists of optimal parameters for different GPU cards and different (m, n, k)-triplets.
50
48
51
49
## Contributing to libsmm_acc
52
50
@@ -56,19 +54,13 @@ We expect users to contribute to the library by providing new optimized kernels
56
54
57
55
Follow the [autotuning procedure](https://github.com/cp2k/dbcsr/blob/develop/src/acc/libsmm_acc/tune/README.md)
58
56
59
-
#### Predictive modeling of kernel parameters
60
-
61
-
Follow the [predictive modeling procedure](https://github.com/cp2k/dbcsr/blob/develop/src/acc/libsmm_acc/predict/README.md)
62
-
63
57
#### Adding a new kernel
64
58
65
59
1. Choose a kernel `name`
66
60
67
61
2. Add the kernel's code (must be able to compile by both `nvcc` and `hip`) in file `kernels/smm_acc_dnt_name.h`
68
62
69
-
3. Add python kernel class inheriting from base class `kernels/smm_acc_dnt_name.py`
70
-
71
-
4. Add the new kernel to the `kernel_algorithm` data structure in [`kernels/smm_acc_predict.py`](https://github.com/cp2k/dbcsr/blob/develop/src/acc/libsmm_acc/kernels/smm_acc_predict.py)
63
+
3. Add Python kernel class inheriting from base class `kernels/smm_acc_dnt_name.py`
72
64
73
65
#### Adding support for a new GPU card
74
66
@@ -85,4 +77,4 @@ Follow the [predictive modeling procedure](https://github.com/cp2k/dbcsr/blob/de
85
77
}
86
78
```
87
79
88
-
then add matrix-matrix multiplication parameters for this GPU using *autotuning* and *predictive modeling*
80
+
then add matrix-matrix multiplication parameters for this GPU using *autotuning*.
*[`smm_acc_predict.py`](https://github.com/cp2k/dbcsr/blob/develop/src/acc/libsmm_acc/kernels/smm_acc_predict.py) Class and helper functions for parameter prediction procedure
0 commit comments