llama.cpp
160aceca - iq3_s_multiplier: CUDA and AVX2 works

Commit
1 year ago
iq3_s_multiplier: CUDA and AVX2 works CUDA is 153.8 t/s, so faster than lookup table (151 t/s) and Q3_K_S (145 t/s). AVX2 on Ryzen-5975WX is 13.7 t/s, so faster than lookup (12.7 t/s), but slower than Q3_K_S (15.5 t/s).
Author
Iwan Kawrakow
Parents
Loading