onnxruntime
ff0ab0a8 - Quantize Weight for Gemm/Conv on Quantized Model (#22969)

Commit
345 days ago
Quantize Weight for Gemm/Conv on Quantized Model (#22969) Some quantized models have QDQ around Conv/Gemm but the weight and/or bias are not quantized. This PR adds WeightBiasQuantization optimizer to quantize float weight and/or bias to INT8 and INT32 tensors respectively. We only do this for weight and/or bias initializer so that ConstantFolding will fold the sub-graph to real quantized initializers during the graph optimization next round.
Author
Parents
Loading