llama.cpp
ggml-quants : weighted rounding algorithms with cumulative search
#12557
Open

Commits
  • ggml-quants : improve IQ4_NL, IQ4_XS, and Q3_K
    compilade committed 310 days ago
  • ggml-quants : better and faster make_qkxs_quants
    compilade committed 310 days ago
  • ggml-quants : improve imatrix behavior for TQ1_0, TQ2_0, Q4_0, Q5_0
    compilade committed 310 days ago
  • ggml-quants : improve TQ2_0 imatrix
    compilade committed 296 days ago
  • ggml-quants : remove some commented code
    compilade committed 288 days ago
  • ggml-quants : faster exhaustive IQ4_NL rounding with k_heap
    compilade committed 288 days ago
  • ggml-quants : use a max-heap for linear quants like Q3_K
    compilade committed 283 days ago
  • ggml-quants : use qkxh in more places
    compilade committed 282 days ago
  • ggml-quants : use a max-heap for TQ1_0 and TQ2_0 quantization
    compilade committed 281 days ago
  • ggml-quants : remove slower qsort-based cumulative search
    compilade committed 281 days ago
  • Merge branch 'master' into compilade/optimal-rounding
    compilade committed 281 days ago
  • ggml-quants : restore Q2_K use of make_qp_quants
    compilade committed 281 days ago
  • ggml-quants : fix some edge cases in make_qkxh_nl_quants
    compilade committed 280 days ago
Loading