Add channel shuffle op fp32 + quantized. (#36815)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36815
Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.
ghstack-source-id: 103267234
Test Plan:
buck run caffe2/test:quantization --
quantization.test_quantized.TestQuantizedOps.test_channel_shuffle
X86 implementation for QNNPACK is sse2 so this may not be the most efficient
for x86.
Reviewed By: dreiss
Differential Revision: D21093841
fbshipit-source-id: 5282945f352df43fdffaa8544fe34dba99a5b97e