scale initializer is shared (#28326)

Commit

24 days ago

fix: skip DQ->MatMulNBits fusion when weight/scale initializer is shared (#28326) ## Summary - Reject the DQ→MatMulNBits fusion when the weight or scale initializer is shared by multiple consumers (e.g. tied-embedding pattern). - Prevents a crash in `TransposeDQWeightsForMatMulNBits` ("Missing required scale") when loading models like Whisper's `decoder_model_merged_uint8.onnx`. - Adds a regression test covering Int4x2 and UInt4x2 with and without zero-points. ## Motivation Fixes #28306. ORT 1.25 regressed on quantized Whisper decoder models. PR #27769 broadened the `DQ→MatMulNBits` fusion, but its `CheckOutputEdges` guard only checks DQ output edges; it does not catch the case where two DQ nodes share the same weight + scale initializers at their inputs. The first fusion consumes the shared initializer; the second fusion then asserts in `qdq_actions.cc:136` because the initializer has been removed from the graph. ## Changes - `onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc`: in `DQMatMulNodeGroupSelector::Check`, after the existing `CheckOutputEdges` guard, return `false` if either the weight initializer or the scale initializer has more than one consumer node. This conservatively preserves both DQ nodes as `DequantizeLinear` + `MatMul` rather than attempting a fusion whose first half would invalidate the second. - `onnxruntime/test/optimizer/qdq_matmulnbits_transformer_test.cc`: adds `DQMatMulNotConvertedToMatMulNBits_SharedWeight` (4 variants: Int4x2/UInt4x2 × with/without zero-points). Builds two DQ nodes pointing at a shared weight + shared scale initializer, runs the QDQ transformer at `TransformerLevel::Level2`, and asserts no `MatMulNBits` is emitted (`MatMul`=2, `DequantizeLinear`=2, `MatMulNBits`=0). Without the fix the second fusion crashes before the assertion runs, so this is a real regression guard. ## Test Plan - C++ unit tests: `DQMatMulNotConvertedToMatMulNBits_SharedWeight` is included in `onnxruntime_test_all` and will run in CI. - Local lintrunner: `lintrunner -a` clean on the diff. - Manual: loading `onnx-community/whisper-tiny` `decoder_model_merged_uint8.onnx` via `InferenceSession` no longer asserts in `TransposeDQWeightsForMatMulNBits`. Fixes #28306

References

#28326 - fix: skip DQ->MatMulNBits fusion when weight/scale initializer is shared

Author

Rishi-Dave

Parents

fafe5644

onnxruntime 183e7a94 - fix: skip DQ->MatMulNBits fusion when weight/scale initializer is shared (#28326)

onnxruntime
183e7a94 - fix: skip DQ->MatMulNBits fusion when weight/scale initializer is shared (#28326)