Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Updating profiler tutorial to include new custom operator profiling #15403

Merged
merged 14 commits into from
Jul 3, 2019
9 changes: 9 additions & 0 deletions docs/tutorials/python/profiler.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,15 @@ Let's zoom in to check the time taken by operators

The above picture visualizes the sequence in which the operators were executed and the time taken by each operator.

### Profiling Custom Operators
Should the existing NDArray operators fail to meet all your model's needs, MXNet supports [Custom Operators](https://mxnet.incubator.apache.org/versions/master/tutorials/gluon/customop.html) that you can define in Python. In `forward()` and `backward()` of a custom operator, there are two kinds of code: "pure Python" code (NumPy operators included) and "sub-operators" (NDArray operators called within `forward()` and `backward()`). With that said, MXNet can profile the execution time of both kinds without additional setup. Specifically, the MXNet profiler will break a single custom operator call into a pure Python event and several sub-operator events if there are any. Furthermore, all of those events will have a prefix in their names, which is, conveniently, the name of the custom operator you called.

![Custom Operator Profiling Screenshot](https://cwiki.apache.org/confluence/download/attachments/118172065/image2019-6-14_15-23-42.png?version=1&modificationDate=1560551022000&api=v2)

As shown by the screenshot, in the **Custom Operator** domain where all the custom operator-related events fall into, you can easily visualize the execution time of each segment of your custom operator. For example, we know that `CustomAddTwo::sqrt` is a sub-operator of custom operator `CustomAddTwo`, and we also know when it is executed accurately.

Please note that: to be able to see the previously described information, you need to set `profile_imperative` to `True` even when you are using custom operators in [symbolic mode](https://mxnet.incubator.apache.org/versions/master/tutorials/basic/symbol.html). The reason is that within custom operators, pure python code and sub-operators are still called imperatively.

## Advanced: Using NVIDIA Profiling Tools

MXNet's Profiler is the recommended starting point for profiling MXNet code, but NVIDIA also provides a couple of tools for low-level profiling of CUDA code: [NVProf](https://devblogs.nvidia.com/cuda-pro-tip-nvprof-your-handy-universal-gpu-profiler/), [Visual Profiler](https://developer.nvidia.com/nvidia-visual-profiler) and [Nsight Compute](https://developer.nvidia.com/nsight-compute). You can use these tools to profile all kinds of executables, so they can be used for profiling Python scripts running MXNet. And you can use these in conjunction with the MXNet Profiler to see high-level information from MXNet alongside the low-level CUDA kernel information.
Expand Down