quant: add q_batchnorm_1d op (#42491)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42491
Hooks up quantized batchnorm_1d to the quantized_bn kernel. Eager mode
hookup will be in a future PR, and graph mode should work after this PR.
Note: currently the implementation is ~2x slower on the benchmark than q_batch_norm2d
because we convert back to contiguous memory format at the end, since
channels_last is only defined for rank >= 4. If further optimization is
needed, that can be a separate PR (will need the NHWC folks to see if
there is a workaround). Meanwhile, having this is better than not having anything.
Context: There have been both internal and external requests for various
quantized BN1d use cases.
Test Plan:
```
python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d
python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d_relu
python test/test_quantization.py TestQuantizeJitOps.test_qbatch_norm
// performance:
// https://gist.github.com/vkuzo/73a07c0f24c05f5804990d9ebfaecf5e
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D22926254
fbshipit-source-id: 2780e6a81cd13a7455f6ab6e5118c22850a97a12