llama.cpp
39e3a429 - iq3_s: somewhat faster AVX2 dot product

Commit
1 year ago
iq3_s: somewhat faster AVX2 dot product On Ryzen a 7950X TG-128 increases to 16 t/s from 15.5 t/s using 16 threads. For 8 threads it is 13.85 t/s vs 11.75 t/s. PP-512 increases to 28.5 t/s from 23.8 t/s.
Author
Iwan Kawrakow
Parents
Loading