Open
Description
PyTorch provides specialized cudnn operators for many common implementations, such as F.group_norm
In the past speed test benchmarks of the keras-team, we can also find that the torch backend of keras is usually slower than the native torch backend. This may be related to the fact that many implementations in keras do not use the cudnn operators provided by torch.
Is it necessary to make some special optimizations for the torch backend for the existing layer implementation to improve the inference performance of the torch backend.
If necessary, we can submit a PR.