Use full range for q4_0 quantization

Commit

2 years ago

Use full range for q4_0 quantization By keeping the sign of the highest magnitude, we can make sure the highest value maps to -8, which is currently unused. This is a bit of a freebie since it is fully backwards compatible with the current format. quantize-stats output: before(7B): q4_0 : mse 0.00000492, maxerr 0.14257812 after(7B): q4_0 : mse 0.00000386, maxerr 0.18200684 (Most layers have reduced maxerr under this rule, but the total max error is indeed slightly higher)

Author

unbounded

Committer

ggerganov

Parents

0e018fe0

llama.cpp 3698f79e - Use full range for q4_0 quantization

llama.cpp
3698f79e - Use full range for q4_0 quantization