llama.cpp
e9b66ee9 - metal : add Q4_1 implementation (#1785)

Commit
2 years ago
metal : add Q4_1 implementation (#1785) 23.3 ms / token, so just ~1% slower than q4_0. Achieves 290 GB/s memory throughput. Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Author
Parents
Loading