Register Flatten as Direct8Bit op in Python QDQ static quantizer (#28340)
## Summary
- Register `Flatten` in `QLinearOpsRegistry` (`Direct8BitOp`) and
`QDQRegistry` (`QDQDirect8BitOp`) so the Python QDQ quantizer no longer
emits a redundant `DequantizeLinear -> Flatten -> QuantizeLinear` pair
around layout-only `Flatten` nodes.
- Add `test_op_flatten.py` covering both `QOperator` and `QDQ` formats
in u8/u8 and s8/s8.
## Motivation
Fixes #21375. The C++ runtime optimizer was updated in #21376 to drop
redundant Q/DQ pairs around `Flatten`, mirroring its existing `Reshape`
handling. The Python-layer quantizer was never updated, so
`quantize_static` / `quantize_dynamic` still emitted the redundant Q/DQ
pair around `Flatten` in QDQ format. This PR aligns the Python tooling
with the runtime behavior.
`Flatten` is a layout-only op with no arithmetic — its quantization
semantics are identical to `Reshape`, `Squeeze`, and `Unsqueeze`, all of
which are already registered as `Direct8BitOp` / `QDQDirect8BitOp`.
## Changes
- `onnxruntime/python/tools/quantization/registry.py`: add `"Flatten":
Direct8BitOp` to `QLinearOpsRegistry` and `"Flatten": QDQDirect8BitOp`
to `QDQRegistry`.
- `onnxruntime/test/python/quantization/test_op_flatten.py`: new test
file modeled on `test_op_reshape.py` covering u8/u8 and s8/s8 in both
QOperator and QDQ formats; asserts no extra
`QuantizeLinear`/`DequantizeLinear` is inserted around `Flatten`, and
verifies numerical correctness via `check_model_correctness`.
## Test Plan
- `python -m pytest
onnxruntime/test/python/quantization/test_op_flatten.py -v` — 2 tests
pass.
- `python -m pytest
onnxruntime/test/python/quantization/test_op_reshape.py -v` —
regression: 2 tests pass.
- `python -m pytest
onnxruntime/test/python/quantization/test_op_squeeze_unsqueeze.py -v` —
regression: 2 tests pass.
- `lintrunner -a` clean on changed files.