llama.cpp
ggml-quants : weighted rounding algorithms with cumulative search
#12557
Open
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
13
Changes
View On
GitHub
ggml-quants : weighted rounding algorithms with cumulative search
#12557
compilade
wants to merge 13 commits into
master
from
compilade/optimal-rounding
ggml-quants : improve IQ4_NL, IQ4_XS, and Q3_K
dd6b8408
ggml-quants : better and faster make_qkxs_quants
d0060fc4
ggml-quants : improve imatrix behavior for TQ1_0, TQ2_0, Q4_0, Q5_0
6f7fe749
ggml-quants : improve TQ2_0 imatrix
f27c1afc
ggml-quants : remove some commented code
0c9e4424
ggml-quants : faster exhaustive IQ4_NL rounding with k_heap
30ad9c28
ggml-quants : use a max-heap for linear quants like Q3_K
3be11510
ggml-quants : use qkxh in more places
f86b8ff2
ggml-quants : use a max-heap for TQ1_0 and TQ2_0 quantization
3e4b675c
ggml-quants : remove slower qsort-based cumulative search
af23abd3
Merge branch 'master' into compilade/optimal-rounding
a4113972
ggml-quants : restore Q2_K use of make_qp_quants
8b8b88f3
ggml-quants : fix some edge cases in make_qkxh_nl_quants
a5b19439
github-actions
added
ggml
compilade
added
generation quality
compilade
added
research 🔬
compilade
added
Less than 4 bits
compilade
added
Review Complexity : Medium
compilade
added
Tensor Encoding Scheme
compilade
marked this pull request as draft
260 days ago
selim1903
commented on 2025-04-01
selim1903
requested changes on 2025-04-01
selim1903
commented on 2025-04-01
Login to write a write a comment.
Login via GitHub
Reviewers
selim1903
Assignees
No one assigned
Labels
generation quality
research 🔬
Less than 4 bits
Review Complexity : Medium
ggml
Tensor Encoding Scheme
Milestone
No milestone
Login to write a write a comment.
Login via GitHub