onnxruntime
7e613ee8 - [quant] supports act_order inputs in Matmulnbits and new quantization algorithm "hqq" (#19106)

Commit
1 year ago
[quant] supports act_order inputs in Matmulnbits and new quantization algorithm "hqq" (#19106) ### Description <!-- Describe your changes. --> 1. Support quantized GPTQ weight in huggingface like [TheBloke/Llama-2-7B-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ) 2. Support Act_order for GPTQ 3. Support [HQQ](https://mobiusml.github.io/hqq_blog/) algorithm to quantize matmul weight and add quant script ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
Author
Parents
Loading