onnxruntime
49d197a8 - Enable ClipQuantFusion exclusively on CPU EP (#20627)

Commit

1 year ago

Enable ClipQuantFusion exclusively on CPU EP (#20627) ### Motivation and Context The Intel NPU does not support 16-bit int quantized operators. Consequently, the execution provider removes the QuantizeLinear/DeQuantizeLinear (Q/DQ) operators from node units and executes the operation as FP16 in the backend. However, if a Clip operator was fused into a Q operator in the node unit, the removal of Q/DQ operators results in inaccuracies because the effect of the original Clip operators is lost. Consider the following example: - FP32 model: -> Op_FP32 -> Clip -> - QDQ model: -> (DQ-> Op_FP32 -> Q) -> (DQ' -> Clip -> Q') -> - After ClipQuantFusion: -> (DQ-> Op_FP32 -> Q) -> (DQ' -> Q') -> - Intel Execution Provider strips Q/DQ: -> Op_FP16 -> To solve this issue, we have enabled ClipQuantFusion exclusively on the CPU execution provider.

References

#20627 - Enable ClipQuantFusion exclusively on CPU EP

Author

yihonglyu

Parents

4fe565a6

onnxruntime 49d197a8 - Enable ClipQuantFusion exclusively on CPU EP (#20627)

onnxruntime
49d197a8 - Enable ClipQuantFusion exclusively on CPU EP (#20627)