This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
LSTM and GRU layers without DNNL enabled give wrong gradients #17898
Open
Description
Description
Currently, we have two implementations of RNN layers on the CPU backend, which are
- Native fusion implementation,
- The fusion enabled by DNNL library (https://intel.github.io/mkl-dnn/dev_guide_rnn.html).
Both of them can be invoked from mx.sym.RNN
, mx.rnn.FusedRNNCell
, mx.gluon.rnn.LSTM/GRU/RNN
. The fusion of DNNL provides more efficient Forward and Backward, while the native one gives a backup for some devices or environments that cannot use DNNL library.
Recently, we have found that there are some problems leading to the wrong gradients' calculation of the native implementation. Just tracking the issue here, and it will be fixed ASAP.