llama.cpp
82380acf
- iq1_s: we can do even better
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
1 year ago
iq1_s: we can do even better Spent one of the 4 scale bits on a signs of a 0.125 shift. I.e., quants are now -1 + delta, delta, 1 + delta, where delta is +/- 0.125. CUDA works, same performance as before. PPL(LLaMA-v2-7B) is now 11.85!
References
#5999 - 1.5 bit: we can do even better
Author
Iwan Kawrakow
Parents
be858f62
Loading