llama.cpp
160aceca
- iq3_s_multiplier: CUDA and AVX2 works
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
1 year ago
iq3_s_multiplier: CUDA and AVX2 works CUDA is 153.8 t/s, so faster than lookup table (151 t/s) and Q3_K_S (145 t/s). AVX2 on Ryzen-5975WX is 13.7 t/s, so faster than lookup (12.7 t/s), but slower than Q3_K_S (15.5 t/s).
References
#5867 - IQ3_S: multiplier based code book
Author
Iwan Kawrakow
Parents
4c21c826
Loading