[quant] adding memoryless observers for embeddingbag QAT work (#65699)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65699
related to: https://github.com/pytorch/pytorch/pull/65443#discussion_r715132425
The QAT and PAT (pruning aware training) support for embedding bags needs a memoryless observer to work properly. This is necessitated by the changing pruned/non-pruned weights during training which can significantly change the quantization parameters.
This PR adds a memoryless flag to the simpler observer classes (not moving average since those explicitly have memory)
In addition to the above, I altered the reset_min_max_vals
function for MinMaxObserver so that it would preserve the device of the
existing self.min_val and self.max_val which was not preserved
previously compared to how it is initialized (using factory_kwargs)
Test Plan:
python test/test_quantization.py TestObserver
(added test_memoryless_minmaxobserver, test_memoryless_per_channel_minmaxobserver, test_memoryless_histogramobserver)
Imported from OSS
Reviewed By: supriyar
Differential Revision: D31209773
fbshipit-source-id: 44a63298e44880fbd3576f49ac568e781f3fd79a