onnxruntime
34d41219 - mlas bitnet to use llama cpp dot routines. It serves as a fallback and shows how other bitnet implementations can be added to ORT. TODOs: handle any K values, blklen != 256, quantizer from fp32/16 to Bitnet, Make MatmulNBit kernel work with mlas with ternary weights

Commit

1 year ago

mlas bitnet to use llama cpp dot routines. It serves as a fallback and shows how other bitnet implementations can be added to ORT. TODOs: handle any K values, blklen != 256, quantizer from fp32/16 to Bitnet, Make MatmulNBit kernel work with mlas with ternary weights Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>

Author

liqunfu

Parents

ff531cbe

onnxruntime 34d41219 - mlas bitnet to use llama cpp dot routines. It serves as a fallback and shows how other bitnet implementations can be added to ORT. TODOs: handle any K values, blklen != 256, quantizer from fp32/16 to Bitnet, Make MatmulNBit kernel work with mlas with ternary weights

onnxruntime
34d41219 - mlas bitnet to use llama cpp dot routines. It serves as a fallback and shows how other bitnet implementations can be added to ORT. TODOs: handle any K values, blklen != 256, quantizer from fp32/16 to Bitnet, Make MatmulNBit kernel work with mlas with ternary weights