llama.cpp
e9b66ee9
- metal : add Q4_1 implementation (#1785)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
2 years ago
metal : add Q4_1 implementation (#1785) 23.3 ms / token, so just ~1% slower than q4_0. Achieves 290 GB/s memory throughput. Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
References
#1785 - metal : add Q4_1 implementation
Author
ikawrakow
Parents
4f0154b0
Loading