pytorch
24fc680e - [Quant] Enable XNNPACK ops in QNNPACK BackendConfig (#85863)

Commit View On GitHub

Commit

1 year ago

[Quant] Enable XNNPACK ops in QNNPACK BackendConfig (#85863) **Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of https://github.com/pytorch/pytorch/pull/74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/85863 Approved by: https://github.com/jerryzh168

Author

andrewor14

Committer

pytorchmergebot

Parents

d9421f81

pytorch 24fc680e - [Quant] Enable XNNPACK ops in QNNPACK BackendConfig (#85863)

Commit

pytorch
24fc680e - [Quant] Enable XNNPACK ops in QNNPACK BackendConfig (#85863)