fix: skip DQ->MatMulNBits fusion when weight/scale initializer is shared (#28326)
## Summary
- Reject the DQ→MatMulNBits fusion when the weight or scale initializer
is shared by multiple consumers (e.g. tied-embedding pattern).
- Prevents a crash in `TransposeDQWeightsForMatMulNBits` ("Missing
required scale") when loading models like Whisper's
`decoder_model_merged_uint8.onnx`.
- Adds a regression test covering Int4x2 and UInt4x2 with and without
zero-points.
## Motivation
Fixes #28306.
ORT 1.25 regressed on quantized Whisper decoder models. PR #27769
broadened the `DQ→MatMulNBits` fusion, but its `CheckOutputEdges` guard
only checks DQ output edges; it does not catch the case where two DQ
nodes share the same weight + scale initializers at their inputs. The
first fusion consumes the shared initializer; the second fusion then
asserts in `qdq_actions.cc:136` because the initializer has been removed
from the graph.
## Changes
-
`onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc`:
in `DQMatMulNodeGroupSelector::Check`, after the existing
`CheckOutputEdges` guard, return `false` if either the weight
initializer or the scale initializer has more than one consumer node.
This conservatively preserves both DQ nodes as `DequantizeLinear` +
`MatMul` rather than attempting a fusion whose first half would
invalidate the second.
- `onnxruntime/test/optimizer/qdq_matmulnbits_transformer_test.cc`: adds
`DQMatMulNotConvertedToMatMulNBits_SharedWeight` (4 variants:
Int4x2/UInt4x2 × with/without zero-points). Builds two DQ nodes pointing
at a shared weight + shared scale initializer, runs the QDQ transformer
at `TransformerLevel::Level2`, and asserts no `MatMulNBits` is emitted
(`MatMul`=2, `DequantizeLinear`=2, `MatMulNBits`=0). Without the fix the
second fusion crashes before the assertion runs, so this is a real
regression guard.
## Test Plan
- C++ unit tests: `DQMatMulNotConvertedToMatMulNBits_SharedWeight` is
included in `onnxruntime_test_all` and will run in CI.
- Local lintrunner: `lintrunner -a` clean on the diff.
- Manual: loading `onnx-community/whisper-tiny`
`decoder_model_merged_uint8.onnx` via `InferenceSession` no longer
asserts in `TransposeDQWeightsForMatMulNBits`.
Fixes #28306