feat(quantization): add ActivationRestrictedAsymmetric option (#28237)
### Description
Adds a new `ActivationRestrictedAsymmetric` extra-option to the Python
quantization tools. When enabled, uint8 activation zero-points are
snapped
to either 0 (when `rmin >= 0`, e.g. post-ReLU/Sigmoid tensors) or 128
(when `rmin < 0`). The scale is recomputed so the dequantized range
still
covers `[rmin, rmax]` without clipping.
This restricted asymmetric mode is required by some hardware
accelerators
that only support these two zero-point values for uint8 quantization,
without requiring the full restriction to symmetric (zero-point = 128
for
all tensors).
### Motivation and Context
Fixes #21398.
Existing options cover only fully symmetric (`ActivationSymmetric` →
zero-point fixed at 128) or unrestricted asymmetric. There was no mode
that picks the closer of {0, 128} per tensor based on its observed
range.
### Changes
- `quant_utils.py`: new `snap_zero_point_to_uint8(rmin, rmax)` helper.
- `base_quantizer.py`: parse new `ActivationRestrictedAsymmetric`
extra-option.
- `onnx_quantizer.py` and `qdq_quantizer.py`: apply snap after
`compute_scale_zp` in the activation path. Guarded on
`quant_type == UINT8 and not symmetric`. Weight and int8 paths are
untouched.
- `quantize.py`: document the new option in the four `extra_options`
docstrings.
- `test_symmetric_flag.py`: new `TestRestrictedAsymmetricFlag` covering
three cases (positive range → zp=0, signed range → zp=128, and
option-disabled regression).
### Testing
\`\`\`
python -m pytest
onnxruntime/test/python/quantization/test_symmetric_flag.py -v
\`\`\`
All 7 tests pass (4 existing + 3 new). \`lintrunner\` is clean.