Optimize InstanceNormOp forward (#22130)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22130
Optimize InstanceNormOp forward
For InstanceNormOp on CPU with order = NHWC, N = 128, C = 256, H = 56, W = 56: 183ms -> 115ms.
For InstanceNormOp on GPU with N = 256, C = 256, H = 112, W = 112:
NCHW: 1475ms -> 45ms
NHWC: 1597ms -> 79ms
Reviewed By: houseroad
Differential Revision: D15963711
fbshipit-source-id: 3fa03109326456b9f301514fecbefa7809438d3e