onnxruntime
7e613ee8 - [quant] supports act_order inputs in Matmulnbits and new quantization algorithm "hqq" (#19106)

Commit

1 year ago

[quant] supports act_order inputs in Matmulnbits and new quantization algorithm "hqq" (#19106) ### Description  1. Support quantized GPTQ weight in huggingface like [TheBloke/Llama-2-7B-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ) 2. Support Act_order for GPTQ 3. Support [HQQ](https://mobiusml.github.io/hqq_blog/) algorithm to quantize matmul weight and add quant script ### Motivation and Context

References

#19106 - [quant] supports act_order inputs in Matmulnbits and new quantization algorithm "hqq"

Author

wejoncy

Parents

2a5c9b86

onnxruntime 7e613ee8 - [quant] supports act_order inputs in Matmulnbits and new quantization algorithm "hqq" (#19106)

onnxruntime
7e613ee8 - [quant] supports act_order inputs in Matmulnbits and new quantization algorithm "hqq" (#19106)