llama.cpp
ggml-quants : weighted rounding algorithms with cumulative search
#12557
Open

ggml-quants : weighted rounding algorithms with cumulative search #12557

compilade wants to merge 13 commits into master from compilade/optimal-rounding
compilade
compilade ggml-quants : improve IQ4_NL, IQ4_XS, and Q3_K
dd6b8408
compilade ggml-quants : better and faster make_qkxs_quants
d0060fc4
compilade ggml-quants : improve imatrix behavior for TQ1_0, TQ2_0, Q4_0, Q5_0
6f7fe749
compilade ggml-quants : improve TQ2_0 imatrix
f27c1afc
compilade ggml-quants : remove some commented code
0c9e4424
compilade ggml-quants : faster exhaustive IQ4_NL rounding with k_heap
30ad9c28
compilade ggml-quants : use a max-heap for linear quants like Q3_K
3be11510
compilade ggml-quants : use qkxh in more places
f86b8ff2
compilade ggml-quants : use a max-heap for TQ1_0 and TQ2_0 quantization
3e4b675c
compilade ggml-quants : remove slower qsort-based cumulative search
af23abd3
compilade Merge branch 'master' into compilade/optimal-rounding
a4113972
compilade ggml-quants : restore Q2_K use of make_qp_quants
8b8b88f3
compilade ggml-quants : fix some edge cases in make_qkxh_nl_quants
a5b19439
github-actions github-actions added ggml
compilade
compilade compilade added generation quality
compilade compilade added research 🔬
compilade compilade added Less than 4 bits
compilade compilade added Review Complexity : Medium
compilade compilade added Tensor Encoding Scheme
jukofyork
jukofyork
compilade
jukofyork
ggerganov
ggerganov
schmorp
compilade
compilade compilade marked this pull request as draft 1 year ago
selim1903
selim1903 commented on 2025-04-01
selim1903
selim1903 requested changes on 2025-04-01
selim1903
selim1903 commented on 2025-04-01
jukofyork
compilade
jukofyork
jukofyork
jukofyork
compilade
compilade

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone