Improve performance of Int8SpatialBN (needed for DF4 quantization) (#19702)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19702
avx2 implementation of core compute for Int8SpatialBN
Reviewed By: jianyuh
Differential Revision: D15073973
fbshipit-source-id: c30b0c621348ba9331ba5e48b281c00cf6e479a1