Add avx2 integer horizontal sum and sum of squares to vec256 qint types (#35693)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35693
Adds utility functions to quantized int types of vec256 to calculate
horizontal sums and sums of squares using avx2 intrinsics.
This is useful for quantized implementations of various normalization
layers (LayerNorm, GroupNorm, InstanceNorm), where we need to calculate
the mean and variance of a layer of quantized ints.
Test Plan:
Adhoc c++ tester for the correctness of the avx2 functions:
https://gist.github.com/vkuzo/0380f450793cd5c05abbeacb6d3883ae
Run with:
```
-lstdc++ -mavx2 -lm -ldl -o main main.cpp && ./main
```
The integration bits and performance will be tested in the next PR in the stack
where we will hook quantized Layernorm to use this.
Imported from OSS
Differential Revision: D20768804
fbshipit-source-id: 4720dd358dde0dabbab8e1a33a67be55925d98f9