Skip to content

Profiling Oceananigans #162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ali-ramadhan opened this issue Apr 3, 2019 · 2 comments
Closed

Profiling Oceananigans #162

ali-ramadhan opened this issue Apr 3, 2019 · 2 comments
Labels
GPU 👾 Where Oceananigans gets its powers from performance 🏍️ So we can get the wrong answer even faster
Milestone

Comments

@ali-ramadhan
Copy link
Member

Might be good to start learning how to properly profile Oceananigans. It is slowing down a little bit as we add more things and develop stuff (see #147 (comment)).

We can easily profile it on a CPU to see where the code spends the most time and maybe find some easy things to optimize before profiling it on a GPU where things might be less obvious.

Some useful links:

@ali-ramadhan ali-ramadhan added performance 🏍️ So we can get the wrong answer even faster GPU 👾 Where Oceananigans gets its powers from labels Apr 3, 2019
@ali-ramadhan ali-ramadhan added this to the v1.0 milestone Apr 3, 2019
@vchuravy
Copy link
Collaborator

vchuravy commented Apr 3, 2019

JuliaLang/julia#4483 (might have to patch LLVM, could be a pain to get working).

You only have to build Julia+LLVM from scratch.

@ali-ramadhan
Copy link
Member Author

We know how to do this now!

==22045== NVPROF is profiling process 22045, command: /ccsopen/proj/gen126/dd_salt_fingers/julia-1.2.0-rc1/bin/julia --project prof.jl
[ Info: Building the CUDAnative run-time library for your sm_70 device, this might take a while...
==22045== Profiling application: /ccsopen/proj/gen126/dd_salt_fingers/julia-1.2.0-rc1/bin/julia --project prof.jl
==22045== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:   50.19%  201.34ms        10  20.134ms  18.875ms  21.592ms  ptxcall_calculate_interior_source_terms__32
                    6.78%  27.179ms        40  679.48us  659.87us  704.16us  void regular_fft<unsigned int=256, unsigned int=1, unsigned int=16, unsigned int=8, unsigned int=1, unsigned int=0, unsigned int=2, unsigned int=1, unsigned int=1, unsigned int=1, unsigned int, double>(kernel_arguments_t<unsigned int>)
                    6.70%  26.872ms        10  2.6872ms  2.6792ms  2.6905ms  ptxcall_adams_bashforth_update_source_terms__33
                    5.43%  21.784ms        10  2.1784ms  2.1585ms  2.1877ms  ptxcall_update_velocities_and_tracers__39
                    5.41%  21.709ms        10  2.1709ms  2.1273ms  2.1975ms  ptxcall_store_previous_source_terms__30
                    5.15%  20.654ms        10  2.0654ms  2.0079ms  2.1441ms  ptxcall_calculate_poisson_right_hand_side__34
                    3.31%  13.272ms        20  663.61us  661.69us  665.41us  ptxcall_anonymous23_35
                    3.26%  13.079ms        20  653.97us  653.09us  656.48us  void vector_fft<unsigned int=256, unsigned int=1, unsigned int=8, unsigned int=2, unsigned int=1, unsigned int=0, unsigned int=2, unsigned int=1, unsigned int=1, unsigned int=0, unsigned int, double>(kernel_arguments_t<unsigned int>)
                    3.23%  12.973ms        20  648.64us  647.90us  649.37us  void scal_kernel_val<double2, double2, int=0>(cublasScalParamsVal<double2, double2>)
                    2.59%  10.376ms        10  1.0376ms  1.0250ms  1.0539ms  ptxcall_update_buoyancy__31
                    2.55%  10.234ms        10  1.0234ms  998.65us  1.0478ms  ptxcall_compute_w_from_continuity__40
                    2.42%  9.7250ms        10  972.50us  960.19us  991.52us  ptxcall_anonymous23_36
                    1.64%  6.5868ms        10  658.68us  657.60us  659.90us  ptxcall_f2___37
                    1.34%  5.3581ms        10  535.81us  534.08us  537.82us  ptxcall_idct_permute__38
                    0.00%  10.144us        10  1.0140us     960ns  1.0880us  [CUDA memcpy HtoD]
      API calls:   54.66%  386.73ms        10  38.673ms  33.355ms  40.957ms  cuMemcpyHtoD
                   45.18%  319.71ms       120  2.6643ms  11.019us  318.08ms  cuLaunchKernel
                    0.14%  988.73us        80  12.359us  9.9020us  18.951us  cudaLaunchKernel
                    0.01%  66.139us        90     734ns     578ns  2.3100us  cuFuncGetAttribute
                    0.01%  42.604us       120     355ns     275ns  1.0540us  cuCtxGetCurrent
                    0.00%  15.705us        60     261ns     205ns     699ns  cudaGetErrorString
                    0.00%  14.711us        40     367ns     259ns  1.8380us  cudaGetLastError
                    0.00%  3.3770us         1  3.3770us  3.3770us  3.3770us  cuDeviceGetCount

Tight compute kernels!
2019-06-03 13_28_09-oceananigans - Chrome Remote Desktop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GPU 👾 Where Oceananigans gets its powers from performance 🏍️ So we can get the wrong answer even faster
Projects
None yet
Development

No branches or pull requests

2 participants