llama.cpp
4be44b7c
- iq1_s: use IQ2_XXS for attn_output
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
1 year ago
iq1_s: use IQ2_XXS for attn_output At a cost of 0.04 extra bpw this gives a big improvement in PPL.
References
#5453 - 1.5 bit quantization
Author
Iwan Kawrakow
Committer
Iwan Kawrakow
Parents
307c5f61
Loading