Augmenting Observers to Support Dynamic Quantization Range (#41113)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41113
In this diff, the `ObserverBase` class is augmented with 2 additional optional arguments qmin and qmax. Correspondingly the calculation of qmin and qmax and the related quantization parameters are modified to accommodate this additional flexibility should the number of bits for quantization be lower than 8 (the default value).
Additional logic in the base class `_calculate_qparams` function has also been modified to provide support for dynamic quantization range.
Test Plan:
To ensure this modification is still backward compatible with past usages, numerics are verified by running the quantization unit test suite, which contains various observer tests. The following command executes the test suite, which also verifies the observer numerics:
`buck test //caffe2/test:quantization -- observer`
This modified observer script can be tested within the experiments for lower bit fake quantization. Please see the following diffs for reference.
- Single Fake Quantizer: D22337447
- Single Conv Layer: D22338532
Reviewed By: z-a-f
Differential Revision: D22427134
fbshipit-source-id: f405e633289322078b0f4a417f54b684adff2549