[Quant][fx] Add quant and scale ranges to BackendConfig (#85200)
**Summary:** This commit adds the following constraints to
BackendConfig:
quant_min_lower_bound
quant_max_upper_bound
scale_min_lower_bound
scale_max_upper_bound
This is motivated by QNNPACK constraints on qint8 weight
values and the min scale value. Actually enforcing these
constraints in the QNNPACK BackendConfig will follow in a
future commit.
Today, users can also specify the above constraints through
QConfigs, and these settings may not necessarily match the
ones specified in the BackendConfig. In this case, we will
handle the discrepancy as follows:
(1) Require QConfig quant ranges to fall within the backend's
(2) Require QConfig min scale value (eps) >= backend's
(3) Require QConfig to specify quant range if the backend
specified one
(4) Require QConfig to specify min scale value (eps) if the
backend specified one
Public API changes:
* Previous API, still supported after this commit:
```
dtype_config = DTypeConfig(
input_dtype=torch.quint8,
output_dtype=torch.quint8,
weight_dtype=torch.qint8,
bias_dtype=torch.float,
)
```
* New API:
```
dtype_config = DTypeConfig(
input_dtype=DTypeWithConstraints(
dtype=torch.quint8,
quant_min_lower_bound=0,
quant_max_upper_bound=127,
scale_min_lower_bound=2 ** -12,
),
output_dtype=DTypeWithConstraints(
dtype=torch.quint8,
quant_min_lower_bound=0,
quant_max_upper_bound=127,
scale_min_lower_bound=2 ** -12,
),
weight_dtype=DTypeWithConstraints(
dtype=torch.qint8,
quant_min_lower_bound=-128,
quant_max_upper_bound=127,
scale_min_lower_bound=2 ** -12,
),
bias_dtype=torch.float,
)
```
* Additionally, the following `DTypeConfig` attributes
have new types with helper getters:
```
# These have type DTypeWithConstraints
dtype_config.input_dtype
dtype_config.output_dtype
dtype_config.weight_dtype
# These return Optional[torch.dtype]
dtype_config.get_input_dtype()
dtype_config.get_output_dtype()
dtype_config.get_weight_dtype()
```
Note that scale_max is currently not used because there is
no existing mechanism to enforce this on the observer. In the
future, we can validate this as well if there is a use case.
**Test Plan:**
python test/test_quantization.py
TestBackendConfig.test_dtype_with_constraints
python test/test_quantization.py
TestQuantizeFx.test_backend_config_scale_min
python test/test_quantization.py
TestQuantizeFx.test_backend_config_quantization_range
**Reviewers:** jerryzh168, vkuzo
**Subscribers:** jerryzh168, vkuzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85200
Approved by: https://github.com/jerryzh168