onnxruntime
0ca313b5 - Relax WeightBiasQuantization constraint for larger QDQ node group (#25673)

Commit

188 days ago

Relax WeightBiasQuantization constraint for larger QDQ node group (#25673) ### Description Relax WeightBiasQuantization constraint for larger QDQ node group ### Motivation and Context The transformer `WeightBiasQuantization` quantizes float weights on `Q -> DQ -> Conv/ConvTranspose/Gemm's Weights -> Q-> DQ` sequence; The check on `Weights -> Q` (`children_nodes.size() != 1 || children_nodes[0]->OpType() != QDQ::QOpName`) is an issue due to it would skip quantization for many common patterns such as unfused activations followed by `Conv` (`DQ - Conv -> ReLU -> Q`). It's actually unnecessary to check ending Q here (the fold can happen anyway without changing model semantics). However, in order to minimize the current behavior change, this PR simply extend the pattern to include single path (no branch), type-preserving path lead to `Q` to enable more quantization support.

References

#25673 - Relax WeightBiasQuantization constraint for larger QDQ node group

Author

qti-yuduo

Parents

3608cb22

onnxruntime 0ca313b5 - Relax WeightBiasQuantization constraint for larger QDQ node group (#25673)

onnxruntime
0ca313b5 - Relax WeightBiasQuantization constraint for larger QDQ node group (#25673)