Operator Performance Regression on CPU

Follow up on dev list discussion:

https://lists.apache.org/thread.html/154ef1e4010671e7375c7a7cbedb413d5a4a3677321488440fb32a3a@%3Cdev.mxnet.apache.org%3E

We have found some operators to have performance regression using the operator benchmark module here:
https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf

@sandeep-krishnamurthy has helped to run the benchmark and this is the **training mode** result:
https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50

**The above result is using training mode (`autograd.record()`) and calculating both forward and backward time.**

As most users use CPU for inference only, to further investigate the impact on inference I have run the scripts using inference mode

Please find the **inference and training mode** results here: 
https://docs.google.com/spreadsheets/d/1_eezNWbrBAm3s3i6G1m0Rd3YYdTEnmKlYtn4klqdyN0/edit?usp=sharing

I have calculated the regression percentage and sorted them, thanks to @aaronmarkham for providing the first version.

Although there are variances on perf numbers between runs, we observe the following commonly used operators be slower consistently.

We need to look into them and fix if root caused.

- [x] Dropout
- [x] relu
- [x] LeakyReLU
- [x] dot
- [x] element wise ops (mul, div, sub)
- [x] broadcast ops (mul, sub)

Some ops regression seems only to happen on mxnet-mkl version (refer to 4th sheet of the google sheet)

Environment:

AWS C5.18xLarge
Deep Learning Base AMI (Ubuntu) Version 18.1
Python 3.6

MXNet versions:
```
with MKLDNN
pip install mxnet-mkl==1.5.0b20190627 
pip install mxnet-mkl==1.4.1

without MKLDNN
pip install mxnet==1.5.0b20190627
pip install mxnet==1.4.1
```
Note: nightly 20190627 contains the latest [commit](https://github.com/apache/incubator-mxnet/commit/582489cebc16a8af21738281dcaee5eee54e478d) in v.1.5.x

Scripts:
https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf
Notes: you need to modify the scripts a bit to run
1. requires python 3.6, requires to add your scripts path to PYTHONPATH, follow [instructions to run benchmark on all operators.](https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf)
2. To run operators in inference mode, you need to set `False` at [this line](https://github.com/apache/incubator-mxnet/blob/master/benchmark/opperf/utils/op_registry_utils.py#L73)
and change `run_backward` to `False` in all files under: 
https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf/nd_operations
for example [here](https://github.com/apache/incubator-mxnet/blob/master/benchmark/opperf/nd_operations/gemm_operators.py#L59).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Operator Performance Regression on CPU #15429

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Operator Performance Regression on CPU #15429

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions