llama.cpp
1cbf5614 - metal : new q4_0 matrix-vector kernel (#2188)

Commit
2 years ago
metal : new q4_0 matrix-vector kernel (#2188) Prefetch data to improve GPU utilization. ~48% faster for 33B model.
Author
Parents
Loading