Conversion from FP32 to Mixed Precision Models #14584
Description
API Addition
Users want to bring a FP32 model and convert it to a Mixed precision model to run inference on it. They want to use the model zoo to convert pretrained models in Python and other frontends. They can do this with gluon models today by casting the inputs and the blocks but the same cannot be done for symbolic models (json and params). Proposing to add an API to convert FP32 models to FP16.
Considering the recent AMP work in progress here: #14173, I think we should add a conversion API to FP16 model under AMP namespace:
amp.convert_model(sym, arg_params, aux_params, target_dtype="float16",
target_precision_ops=None, original_precision_ops=None,
widest_precision_ops=None, excluded_sym_names=None)
With the target_precision_ops
, original_precision_ops
and widest_precision_ops
, users should be able to override the default in the amp lists.
Backend Changes
Additionally, Add a NNVM pass for the backend. This would by default use the amp lists for FP16, FP32 and widest type casts to use FP16 or FP32 inputs.
This pass will perform graph traversal and adding amp_cast and amp_multicast layers for FP16 and FP32 ops.
Planning to start working on the POC unless someone is already working on this.
@ptrendx @DickJC123 @pengzhao-intel @ZhennanQin @eric-haibin-lin @Caenorst
EDIT: Proposal posted on dev list: https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32+to+Mixed+Precision+Models