onnxruntime
ff0ab0a8 - Quantize Weight for Gemm/Conv on Quantized Model (#22969)

Commit

1 year ago

Quantize Weight for Gemm/Conv on Quantized Model (#22969) Some quantized models have QDQ around Conv/Gemm but the weight and/or bias are not quantized. This PR adds WeightBiasQuantization optimizer to quantize float weight and/or bias to INT8 and INT32 tensors respectively. We only do this for weight and/or bias initializer so that ConstantFolding will fold the sub-graph to real quantized initializers during the graph optimization next round.

References

#22969 - Quantize Weight for Gemm/Conv on Quantized Model

Author

Lafi7e

Parents

c75681a4

onnxruntime ff0ab0a8 - Quantize Weight for Gemm/Conv on Quantized Model (#22969)

onnxruntime
ff0ab0a8 - Quantize Weight for Gemm/Conv on Quantized Model (#22969)