onnxruntime
d5d3b287 - Enable 2bit CPU matmul fallback (#25582)

Commit
143 days ago
Enable 2bit CPU matmul fallback (#25582) ### Description - enable 2bit matmulnbits - falls back to ComputeBUnpacked (dequants to fp32) - Also adapting quantize script to enable 2 bits - adds 2bit unit tests - [blockwise quantize for 2bits already implemented ](https://github.com/microsoft/onnxruntime/blob/b9575476e94daa9c6578aba92d8f04324dd15815/onnxruntime/core/mlas/lib/q4_dq.cpp#L407) ### Motivation and Context - working on enabling bitnet + lowbit LLM's --------- Co-authored-by: Hector Li <hecli@microsoft.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Author
Parents
Loading