Optimize InstanceNormGradientOp (#22288)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22288
Optimize InstanceNormGradientOp
Benchmarks:
CPU with [N, C, H, W] = [128, 256, 56, 56],
NCHW order: 616ms -> 128ms
NHWC order: 1612ms -> 174ms
GPU with [N, C, H, W] = [128, 256, 112, 112],
NCHW order: 6450ms -> 37ms
NHWC order: 1419ms -> 82ms
Reviewed By: houseroad
Differential Revision: D16023630
fbshipit-source-id: 5af9bf1103cde2fc2bcb5cd5a057d039732f052e