onnxruntime
335a2174 - feat(quantization): add ActivationRestrictedAsymmetric option (#28237)

Commit
26 days ago
feat(quantization): add ActivationRestrictedAsymmetric option (#28237) ### Description Adds a new `ActivationRestrictedAsymmetric` extra-option to the Python quantization tools. When enabled, uint8 activation zero-points are snapped to either 0 (when `rmin >= 0`, e.g. post-ReLU/Sigmoid tensors) or 128 (when `rmin < 0`). The scale is recomputed so the dequantized range still covers `[rmin, rmax]` without clipping. This restricted asymmetric mode is required by some hardware accelerators that only support these two zero-point values for uint8 quantization, without requiring the full restriction to symmetric (zero-point = 128 for all tensors). ### Motivation and Context Fixes #21398. Existing options cover only fully symmetric (`ActivationSymmetric` → zero-point fixed at 128) or unrestricted asymmetric. There was no mode that picks the closer of {0, 128} per tensor based on its observed range. ### Changes - `quant_utils.py`: new `snap_zero_point_to_uint8(rmin, rmax)` helper. - `base_quantizer.py`: parse new `ActivationRestrictedAsymmetric` extra-option. - `onnx_quantizer.py` and `qdq_quantizer.py`: apply snap after `compute_scale_zp` in the activation path. Guarded on `quant_type == UINT8 and not symmetric`. Weight and int8 paths are untouched. - `quantize.py`: document the new option in the four `extra_options` docstrings. - `test_symmetric_flag.py`: new `TestRestrictedAsymmetricFlag` covering three cases (positive range → zp=0, signed range → zp=128, and option-disabled regression). ### Testing \`\`\` python -m pytest onnxruntime/test/python/quantization/test_symmetric_flag.py -v \`\`\` All 7 tests pass (4 existing + 3 new). \`lintrunner\` is clean.
Author
Parents
Loading