Optimize GroupNorm on CUDA (#28204)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28204
Optimize GroupNorm on CUDA
ghstack-source-id: 105388365
Test Plan: buck test mode/dev-nosan caffe2/test:nn -- "GroupNorm"
Reviewed By: houseroad
Differential Revision: D17923732
fbshipit-source-id: 9afaf01288bd9d273eed89909bff77243df89e9f