llama.cpp
e435bfd9 - RMSE-optimized quants for all quantization types

Commit

3 years ago

RMSE-optimized quants for all quantization types By default this new option is ON. One can turn it off by setting LLAMA_NO_RMSE. With this option enabled, the Q4_3 quantization results in a perplexity of 6.0344, so 0.0273 lower than simple Q4_3 quantization.

Author

Iwan Kawrakow

Committer

ggerganov

Parents

0e018fe0

llama.cpp e435bfd9 - RMSE-optimized quants for all quantization types

llama.cpp
e435bfd9 - RMSE-optimized quants for all quantization types