llama.cpp
ggml-quants : weighted rounding algorithms with cumulative search
#12557

Open

Commits

ggml-quants : improve IQ4_NL, IQ4_XS, and Q3_K

compilade committed 310 days ago
ggml-quants : better and faster make_qkxs_quants

compilade committed 310 days ago
ggml-quants : improve imatrix behavior for TQ1_0, TQ2_0, Q4_0, Q5_0

compilade committed 310 days ago
ggml-quants : improve TQ2_0 imatrix

compilade committed 296 days ago
ggml-quants : remove some commented code

compilade committed 288 days ago
ggml-quants : faster exhaustive IQ4_NL rounding with k_heap

compilade committed 288 days ago
ggml-quants : use a max-heap for linear quants like Q3_K

compilade committed 283 days ago
ggml-quants : use qkxh in more places

compilade committed 282 days ago
ggml-quants : use a max-heap for TQ1_0 and TQ2_0 quantization

compilade committed 281 days ago
ggml-quants : remove slower qsort-based cumulative search

compilade committed 281 days ago
Merge branch 'master' into compilade/optimal-rounding

compilade committed 281 days ago
ggml-quants : restore Q2_K use of make_qp_quants

compilade committed 281 days ago
ggml-quants : fix some edge cases in make_qkxh_nl_quants

compilade committed 280 days ago