fix: propagate output_dtype attribute when inserting Q after DQ (#28144)
## Summary
- `MakeQAttrsFromDQ()` in `qdq_propagation.cc` only copied the DQ's
existing attributes (`axis`, `block_size`) when constructing the
inserted `QuantizeLinear` node, omitting `output_dtype`.
- For opset-21+ graphs whose DQ has no `zero_point` input, `qdq_util.cc`
falls back to `UINT8` when `output_dtype` is missing, silently
saturating negative `INT8` values to 0.
- Inject `output_dtype` derived from the DQ's input element type for
opset >= 21, leaving older opsets unchanged.
## Motivation
Fixes #27845. Without this fix, enabling QDQ propagation
(`ORT_ENABLE_ALL`) produces different — and silently incorrect — outputs
versus `ORT_DISABLE_ALL` for any `DequantizeLinear(int8) ->
Reshape/Transpose/...` pattern lacking a zero-point input.
## Changes
- `onnxruntime/core/optimizer/qdq_transformer/qdq_propagation.cc`: when
`dq_node.SinceVersion() >= 21`, read the DQ input's element type from
`InputDefs()[0]->TypeAsProto()` and inject it as `output_dtype` on the
propagated Q node. Removed the stale `assert(SinceVersion() <= 21)`
(`MatchDQNode()` accepts opsets up to 25).
- `onnxruntime/test/optimizer/qdq_transformer_test.cc`: new test
`QDQPropagation_DQForward_NoZP_OutputDtypeAttribute` parametrized across
`INT8`, `UINT8`, `INT16`, `UINT16`, asserting the inserted Q carries
`output_dtype` matching the DQ input type.
## Test Plan
- `./onnxruntime_test_all
--gtest_filter="QDQTransformerTests.QDQPropagation*"` (covers both the
new test and the sibling
`QDQPropagation_QBackward_NoZP_OutputDtypeAttribute`).
- Existing CPU EP CI suites.
Fixes #27845