onnxruntime
9984c702 - Register Flatten as Direct8Bit op in Python QDQ static quantizer (#28340)

Commit
9 hours ago
Register Flatten as Direct8Bit op in Python QDQ static quantizer (#28340) ## Summary - Register `Flatten` in `QLinearOpsRegistry` (`Direct8BitOp`) and `QDQRegistry` (`QDQDirect8BitOp`) so the Python QDQ quantizer no longer emits a redundant `DequantizeLinear -> Flatten -> QuantizeLinear` pair around layout-only `Flatten` nodes. - Add `test_op_flatten.py` covering both `QOperator` and `QDQ` formats in u8/u8 and s8/s8. ## Motivation Fixes #21375. The C++ runtime optimizer was updated in #21376 to drop redundant Q/DQ pairs around `Flatten`, mirroring its existing `Reshape` handling. The Python-layer quantizer was never updated, so `quantize_static` / `quantize_dynamic` still emitted the redundant Q/DQ pair around `Flatten` in QDQ format. This PR aligns the Python tooling with the runtime behavior. `Flatten` is a layout-only op with no arithmetic — its quantization semantics are identical to `Reshape`, `Squeeze`, and `Unsqueeze`, all of which are already registered as `Direct8BitOp` / `QDQDirect8BitOp`. ## Changes - `onnxruntime/python/tools/quantization/registry.py`: add `"Flatten": Direct8BitOp` to `QLinearOpsRegistry` and `"Flatten": QDQDirect8BitOp` to `QDQRegistry`. - `onnxruntime/test/python/quantization/test_op_flatten.py`: new test file modeled on `test_op_reshape.py` covering u8/u8 and s8/s8 in both QOperator and QDQ formats; asserts no extra `QuantizeLinear`/`DequantizeLinear` is inserted around `Flatten`, and verifies numerical correctness via `check_model_correctness`. ## Test Plan - `python -m pytest onnxruntime/test/python/quantization/test_op_flatten.py -v` — 2 tests pass. - `python -m pytest onnxruntime/test/python/quantization/test_op_reshape.py -v` — regression: 2 tests pass. - `python -m pytest onnxruntime/test/python/quantization/test_op_squeeze_unsqueeze.py -v` — regression: 2 tests pass. - `lintrunner -a` clean on changed files.
Author
Parents
Loading