onnxruntime
34d41219 - mlas bitnet to use llama cpp dot routines. It serves as a fallback and shows how other bitnet implementations can be added to ORT. TODOs: handle any K values, blklen != 256, quantizer from fp32/16 to Bitnet, Make MatmulNBit kernel work with mlas with ternary weights

Commit
1 year ago
mlas bitnet to use llama cpp dot routines. It serves as a fallback and shows how other bitnet implementations can be added to ORT. TODOs: handle any K values, blklen != 256, quantizer from fp32/16 to Bitnet, Make MatmulNBit kernel work with mlas with ternary weights Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>
Author
Parents
Loading