Skip to content

Commit 8e72262

Browse files
committed
Update README
1 parent 1cca705 commit 8e72262

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

README.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Requirements:
2020

2121
- ``vkfft.h`` installed in the usual include directories, or in the 'src' directory
2222
- ``pyopencl`` and the opencl libraries/development tools for the opencl backend
23-
- ``pycuda`` or `cupy` and CUDA developments tools (`nvcc`) for the cuda backend
23+
- ``pycuda`` or ``cupy`` and CUDA developments tools (`nvcc`) for the cuda backend
2424
- ``numpy``
2525

2626
This package can be installed from source using ``python setup.py install`` or ``pip install .``.
@@ -101,17 +101,17 @@ Notes regarding this plot:
101101
transformed array is at around 600MB. Transforms on small arrays with small batch sizes
102102
could produce smaller performances, or better ones when fully cached.
103103
* a number of blue + (CuFFT) are actually performed as radix-N transforms with 7<N<127 (e.g. 11)
104-
-hence the performance similar to the blue dots- but the actual supported radix transforms
104+
-hence the performance similar to the blue dots- but the list of supported radix transforms
105105
is undocumented so they are not correctly labeled.
106106

107107
The general results are:
108108

109109
* vkFFT throughput is similar to cuFFT up to N=1024. For N>1024 vkFFT is much more
110110
efficient than cuFFT due to the smaller number of read and write per FFT axis
111111
(apart from isolated radix-2 3 sizes)
112-
* the OpenCL and CUDA backends of vkFFT perform similarly, as expected. [Note that this should
113-
be true *as long as the card is only used for computing*. If it is also used for display,
114-
then performance may be different, e.g. for nvidia cards opencl performance is more affected
112+
* the OpenCL and CUDA backends of vkFFT perform similarly, though there are ranges
113+
where CUDA performs better. [Note that if the card is also used for display,
114+
then difference can increase, e.g. for nvidia cards opencl performance is more affected
115115
when being used for display than the cuda backend]
116116
* clFFT (via gpyfft) generally performs much worse than the other transforms, though this was
117117
tested using nVidia cards. (Note that the clFFT/gpyfft benchmark tries all FFT axis permutations

0 commit comments

Comments
 (0)