Operator Performance Regression on CPU #15429
Description
Follow up on dev list discussion:
We have found some operators to have performance regression using the operator benchmark module here:
https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf
@sandeep-krishnamurthy has helped to run the benchmark and this is the training mode result:
https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
The above result is using training mode (autograd.record()
) and calculating both forward and backward time.
As most users use CPU for inference only, to further investigate the impact on inference I have run the scripts using inference mode
Please find the inference and training mode results here:
https://docs.google.com/spreadsheets/d/1_eezNWbrBAm3s3i6G1m0Rd3YYdTEnmKlYtn4klqdyN0/edit?usp=sharing
I have calculated the regression percentage and sorted them, thanks to @aaronmarkham for providing the first version.
Although there are variances on perf numbers between runs, we observe the following commonly used operators be slower consistently.
We need to look into them and fix if root caused.
- Dropout
- relu
- LeakyReLU
- dot
- element wise ops (mul, div, sub)
- broadcast ops (mul, sub)
Some ops regression seems only to happen on mxnet-mkl version (refer to 4th sheet of the google sheet)
Environment:
AWS C5.18xLarge
Deep Learning Base AMI (Ubuntu) Version 18.1
Python 3.6
MXNet versions:
with MKLDNN
pip install mxnet-mkl==1.5.0b20190627
pip install mxnet-mkl==1.4.1
without MKLDNN
pip install mxnet==1.5.0b20190627
pip install mxnet==1.4.1
Note: nightly 20190627 contains the latest commit in v.1.5.x
Scripts:
https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf
Notes: you need to modify the scripts a bit to run
- requires python 3.6, requires to add your scripts path to PYTHONPATH, follow instructions to run benchmark on all operators.
- To run operators in inference mode, you need to set
False
at this line
and changerun_backward
toFalse
in all files under:
https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf/nd_operations
for example here.