llama.cpp
ggml-quants : weighted rounding algorithms with cumulative search
#12557

Open

ggml-quants : weighted rounding algorithms with cumulative search #12557

compilade wants to merge 13 commits into master from compilade/optimal-rounding

ggml-quants : improve IQ4_NL, IQ4_XS, and Q3_K

dd6b8408

ggml-quants : better and faster make_qkxs_quants

d0060fc4

ggml-quants : improve imatrix behavior for TQ1_0, TQ2_0, Q4_0, Q5_0

6f7fe749

ggml-quants : improve TQ2_0 imatrix

f27c1afc

ggml-quants : remove some commented code

0c9e4424

ggml-quants : faster exhaustive IQ4_NL rounding with k_heap

30ad9c28

ggml-quants : use a max-heap for linear quants like Q3_K

3be11510

ggml-quants : use qkxh in more places

f86b8ff2

ggml-quants : use a max-heap for TQ1_0 and TQ2_0 quantization

3e4b675c

ggml-quants : remove slower qsort-based cumulative search

af23abd3

Merge branch 'master' into compilade/optimal-rounding

a4113972

ggml-quants : restore Q2_K use of make_qp_quants

8b8b88f3

ggml-quants : fix some edge cases in make_qkxh_nl_quants

a5b19439

github-actions added ggml

compilade added generation quality

compilade added research 🔬

compilade added Less than 4 bits

compilade added Review Complexity : Medium

compilade added Tensor Encoding Scheme

compilade marked this pull request as draft 1 year ago

selim1903 commented on 2025-04-01

selim1903 requested changes on 2025-04-01

selim1903 commented on 2025-04-01

Reviewers

selim1903

Assignees

No one assigned

Labels

generation quality research 🔬 Less than 4 bits Review Complexity : Medium ggml Tensor Encoding Scheme

Milestone

No milestone

llama.cpp ggml-quants : weighted rounding algorithms with cumulative search #12557 Open

ggml-quants : weighted rounding algorithms with cumulative search #12557

llama.cpp
ggml-quants : weighted rounding algorithms with cumulative search
#12557

Open