llama.cpp
f7d05095 - Q4_2 quantization with rmse-optimized scale and quants (#1062)

Commit

3 years ago

Q4_2 quantization with rmse-optimized scale and quants (#1062) * Q4_2 quantization with rmse-optimized scale and quants For quantize-stats we get q4_2: rmse 0.00159301, maxerr 0.17480469, 95pct<0.0030, median<0.0012 For 7B perplexity with BLAS enabled we get 6.2038 after 655 chunks. Quantization is slow (~90 seconds on my Mac for 7B) as not multi-threaded as in PR #896. * ggml : satisfy the sanitizer builds Not sure why this makes them fail * Better follow ggml conventions for function names * Fixed type as per reviewer comment --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

References

#1062 - Q4_2 quantization with rmse-optimized scale and quants

Author

ikawrakow

Parents

884e7d7a

llama.cpp f7d05095 - Q4_2 quantization with rmse-optimized scale and quants (#1062)

llama.cpp
f7d05095 - Q4_2 quantization with rmse-optimized scale and quants (#1062)