onnxruntime
fe463d49 - Support SmoothQuant for ORT static quantization (#16288)

Commit

2 years ago

Support SmoothQuant for ORT static quantization (#16288) ### Description Support SmoothQuant for ORT static quantization via intel neural compressor > Note: Please use neural-compressor==2.2 to try SmoothQuant function. ### Motivation and Context For large language models (LLMs) with gigantic parameters, the systematic outliers make quantification of activations difficult. As a training free post-training quantization (PTQ) solution, SmoothQuant offline migrates this difficulty from activations to weights with a mathematically equivalent transformation. Integrating SmoothQuant into ORT quantization can benefit the accuracy of INT8 LLMs. --------- Signed-off-by: Mengni Wang <mengni.wang@intel.com>

References

#16288 - Support SmoothQuant for ORT static quantization

Author

mengniwang95

Parents

eeef1578

onnxruntime fe463d49 - Support SmoothQuant for ORT static quantization (#16288)

onnxruntime
fe463d49 - Support SmoothQuant for ORT static quantization (#16288)