llama.cpp
ggml-quants : weighted rounding algorithms with cumulative search
#12557
Open

ggml-quants : weighted rounding algorithms with cumulative search #12557

compilade wants to merge 13 commits into master from compilade/optimal-rounding
compilade
compilade ggml-quants : improve IQ4_NL, IQ4_XS, and Q3_K
dd6b8408
compilade ggml-quants : better and faster make_qkxs_quants
d0060fc4
compilade ggml-quants : improve imatrix behavior for TQ1_0, TQ2_0, Q4_0, Q5_0
6f7fe749
compilade ggml-quants : improve TQ2_0 imatrix
f27c1afc
compilade ggml-quants : remove some commented code
0c9e4424
compilade ggml-quants : faster exhaustive IQ4_NL rounding with k_heap
30ad9c28
compilade ggml-quants : use a max-heap for linear quants like Q3_K
3be11510
compilade ggml-quants : use qkxh in more places
f86b8ff2
compilade ggml-quants : use a max-heap for TQ1_0 and TQ2_0 quantization
3e4b675c
compilade ggml-quants : remove slower qsort-based cumulative search
af23abd3
compilade Merge branch 'master' into compilade/optimal-rounding
a4113972
compilade ggml-quants : restore Q2_K use of make_qp_quants
8b8b88f3
compilade ggml-quants : fix some edge cases in make_qkxh_nl_quants
a5b19439
github-actions github-actions added ggml
compilade
compilade compilade added generation quality
compilade compilade added research 🔬
compilade compilade added Less than 4 bits
compilade compilade added Review Complexity : Medium
compilade compilade added Tensor Encoding Scheme
jukofyork
jukofyork
compilade
jukofyork
ggerganov
ggerganov
schmorp
compilade
compilade compilade marked this pull request as draft 260 days ago
selim1903
selim1903 commented on 2025-04-01
selim1903
selim1903 requested changes on 2025-04-01
selim1903
selim1903 commented on 2025-04-01
jukofyork
compilade
jukofyork
jukofyork
jukofyork
compilade
compilade

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone