Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Commit d49445f

Browse files
Zha0q1sandeepkrishnamurthy-dev
authored andcommitted
Updating profiler tutorial to include new custom operator profiling (#15403)
* update profiler tutorial * Update profiler.md * Update profiler.md * Update profiler.md * Update docs/tutorials/python/profiler.md Co-Authored-By: Aaron Markham <[email protected]> * Update docs/tutorials/python/profiler.md Co-Authored-By: Aaron Markham <[email protected]> * Update docs/tutorials/python/profiler.md Co-Authored-By: Aaron Markham <[email protected]> * Update profiler.md change image url to dmlc and add a code example * Update profiler.md * Update profiler.md * Update profiler.md * Update profiler.md * Re-trigger build * Update profiler.md
1 parent 3df3e2c commit d49445f

File tree

1 file changed

+75
-0
lines changed

1 file changed

+75
-0
lines changed

docs/tutorials/python/profiler.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,81 @@ Let's zoom in to check the time taken by operators
206206

207207
The above picture visualizes the sequence in which the operators were executed and the time taken by each operator.
208208

209+
### Profiling Custom Operators
210+
Should the existing NDArray operators fail to meet all your model's needs, MXNet supports [Custom Operators](https://mxnet.incubator.apache.org/versions/master/tutorials/gluon/customop.html) that you can define in Python. In `forward()` and `backward()` of a custom operator, there are two kinds of code: "pure Python" code (NumPy operators included) and "sub-operators" (NDArray operators called within `forward()` and `backward()`). With that said, MXNet can profile the execution time of both kinds without additional setup. Specifically, the MXNet profiler will break a single custom operator call into a pure Python event and several sub-operator events if there are any. Furthermore, all of those events will have a prefix in their names, which is, conveniently, the name of the custom operator you called.
211+
212+
Let's try profiling custom operators with the following code example:
213+
214+
```python
215+
216+
import mxnet as mx
217+
from mxnet import nd
218+
from mxnet import profiler
219+
220+
class MyAddOne(mx.operator.CustomOp):
221+
def forward(self, is_train, req, in_data, out_data, aux):
222+
self.assign(out_data[0], req[0], in_data[0]+1)
223+
224+
def backward(self, req, out_grad, in_data, out_data, in_grad, aux):
225+
self.assign(in_grad[0], req[0], out_grad[0])
226+
227+
@mx.operator.register('MyAddOne')
228+
class CustomAddOneProp(mx.operator.CustomOpProp):
229+
def __init__(self):
230+
super(CustomAddOneProp, self).__init__(need_top_grad=True)
231+
232+
def list_arguments(self):
233+
return ['data']
234+
235+
def list_outputs(self):
236+
return ['output']
237+
238+
def infer_shape(self, in_shape):
239+
return [in_shape[0]], [in_shape[0]], []
240+
241+
def create_operator(self, ctx, shapes, dtypes):
242+
return MyAddOne()
243+
244+
245+
inp = mx.nd.zeros(shape=(500, 500))
246+
247+
profiler.set_config(profile_all=True, continuous_dump = True)
248+
profiler.set_state('run')
249+
250+
w = nd.Custom(inp, op_type="MyAddOne")
251+
252+
mx.nd.waitall()
253+
254+
profiler.set_state('stop')
255+
profiler.dump()
256+
```
257+
258+
Here, we have created a custom operator called `MyAddOne`, and within its `forward()` function, we simply add one to the input. We can visualize the dump file in `chrome://tracing/`:
259+
260+
![Custom Operator Profiling Screenshot](https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/tutorials/python/profiler/profiler_output_custom_operator_chrome.png)
261+
262+
As shown by the screenshot, in the **Custom Operator** domain where all the custom operator-related events fall into, we can easily visualize the execution time of each segment of `MyAddOne`. We can tell that `MyAddOne::pure_python` is executed first. We also know that `CopyCPU2CPU` and `_plus_scalr` are two "sub-operators" of `MyAddOne` and the sequence in which they are executed.
263+
264+
Please note that: to be able to see the previously described information, you need to set `profile_imperative` to `True` even when you are using custom operators in [symbolic mode](https://mxnet.incubator.apache.org/versions/master/tutorials/basic/symbol.html) (refer to the code snippet below, which is the symbolic-mode equivelent of the code example above). The reason is that within custom operators, pure python code and sub-operators are still called imperatively.
265+
266+
```python
267+
# Set profile_all to True
268+
profiler.set_config(profile_all=True, aggregate_stats=True, continuous_dump = True)
269+
# OR, Explicitly Set profile_symbolic and profile_imperative to True
270+
profiler.set_config(profile_symbolic = True, profile_imperative = True, \
271+
aggregate_stats=True, continuous_dump = True)
272+
273+
profiler.set_state('run')
274+
# Use Symbolic Mode
275+
a = mx.symbol.Variable('a')
276+
b = mx.symbol.Custom(data=a, op_type='MyAddOne')
277+
c = b.bind(mx.cpu(), {'a': inp})
278+
y = c.forward()
279+
mx.nd.waitall()
280+
profiler.set_state('stop')
281+
profiler.dump()
282+
```
283+
209284
## Advanced: Using NVIDIA Profiling Tools
210285

211286
MXNet's Profiler is the recommended starting point for profiling MXNet code, but NVIDIA also provides a couple of tools for low-level profiling of CUDA code: [NVProf](https://devblogs.nvidia.com/cuda-pro-tip-nvprof-your-handy-universal-gpu-profiler/), [Visual Profiler](https://developer.nvidia.com/nvidia-visual-profiler) and [Nsight Compute](https://developer.nvidia.com/nsight-compute). You can use these tools to profile all kinds of executables, so they can be used for profiling Python scripts running MXNet. And you can use these in conjunction with the MXNet Profiler to see high-level information from MXNet alongside the low-level CUDA kernel information.

0 commit comments

Comments
 (0)