onnxruntime
d684d90c - [QNN EP] Fix Weight Bias Quantization implementation (#24693)

Commit

258 days ago

[QNN EP] Fix Weight Bias Quantization implementation (#24693) - ORT is populating all node attributes in GetAttributes() API call with keeping default values for unspecified attributes in NodeProto. - Check the possibility of default values and don't skip weight_bias_quantization on Conv operator if the weight has DeQuantize node with attribute block_size=0 is read. ### Description GetAttributes() API call is populating all attributes in node definition with assigning default values for unspecified attributes in the model. Weight Bias Quantization is being skipped when block_size attribute present in the DequantizeLinear node producing the weight for a Conv operator. Gracefully handle the default value of block_size i.e., 0 and apply the Weight Bias Quantization as the default value '0' has no significance. ### Motivation and Context Applying Weight Bias Quantization on Conv operator enables ORT QDQ transformer to fold the DQ-->Conv-->Q pattern into Conv operator. This improves inference time for some QDQ ONNX models.

References

#24693 - [QNN EP] Fix Weight Bias Quantization implementation

Author

quic-tirupath

Parents

3dc91e6c

onnxruntime d684d90c - [QNN EP] Fix Weight Bias Quantization implementation (#24693)

onnxruntime
d684d90c - [QNN EP] Fix Weight Bias Quantization implementation (#24693)