Pre-processing of Quantization (#12729)

Commit

3 years ago

Pre-processing of Quantization (#12729) Shape Inference and Model Optimization before Quantization Model quantization with QDQ format, i.e. inserting QuantizeLinear/DeQuantizeLinear on the tensor, requires tensor shape information to perform its best. Currently, shape inferencing works best with optimized model. As a result, it is highly recommended to run quantization on optimized model with shape information. This change adds code for model optimization and shape inferencing of the following three steps: 1. Symbolic shape inference. 2. Model optimization 3. ONNX shape inference At the same time we should recommend model optimization should be turned off during quantization. As the optimization might change the computation graph, making it harder for the QDQ debugger to locate matching tensors between original and the quantized models.

References

#12729 - Pre-processing of Quantization

Author

chenfucn

Parents

1ce14e75

onnxruntime d761a7ce - Pre-processing of Quantization (#12729)

onnxruntime
d761a7ce - Pre-processing of Quantization (#12729)