[quant] Add a new fused MovingAvg Obs + FakeQuant operator(CPU) (#61570)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61570
Fused operator that computes moving average min/max values (in-place) of the input tensor and fake-quantizes it.
It expects the qmin/qmax values to reflect the range of the quantized tensor (instead of reduce_range)
Motivation for adding this operator is for performance reasons, since moving the computation from python to C++/CUDA can increase the performance of QAT.
Test Plan:
python test/test_quantization.py TestFusedObsFakeQuant
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D29682762
fbshipit-source-id: 28e4c50e77236d6976fe4b326c9a12103ed95840