llama.cpp
IQ3_S: a much better alternative to Q3_K
#5676
Merged

IQ3_S: a much better alternative to Q3_K #5676

ikawrakow merged 27 commits into master from ik/iq3_xs_new2
ikawrakow
iq4_nl: squash commits for easier rebase
10a47fa6
Resurrecting iq3_xs
5691fecd
Minor PPL improvement via a block scale fudge factor
76aff093
Minor improvement via 3 neighbours
5be4e7ac
iq3_xs: working scalar and AVX2 dot products
f1255c50
iq3_xs: ARM_NEON dot product - works but extremely slow (10 t/s)
76214ab6
iq3_xs: working Metal implementation
38aa7b17
Adding IQ3_M - IQ3_XS mix with mostly Q4_K
2ec600b7
iiq3_xs: a 3.4375 bpw variant
d83fddaa
iq3_xs: make CUDA work for new version
eacff4aa
iq3_xs: make scalar and AVX2 work for new version
1fef4b8b
iq3_s: make ARM_NEON work with new version
1328331d
iq3_xs: make new version work on metal
17778255
iq3_xs: tiny Metal speed improvement
87038fe1
iq3_xs: tiny Metal speed improvement
4d5feebe
Fix stupid warning
b25f9960
Q3_K_XS now uses a mix of IQ3_XS and IQ3_XXS
272c7f77
iq3_xs: rename to iq3_s
2730225c
iq3_s: make tests pass
47cf30b0
Move Q3_K_XS mix to 3.25 bpw
cd6a0f08
Attempt to fix failing tests
436a146f
Another attempt to fix the Windows builds
303f3f32
ikawrakow
Artefact2
Nexesenex
Xonar92
Attempt to fix ROCm
0d6d185e
Artefact2
Artefact2 commented on 2024-02-23
ROCm again
1d47de32
ggerganov
ikawrakow
askmyteapot
ikawrakow
ggerganov
ikawrakow
ggerganov
iq3_s: partial fix for QK_K = 64
e6e61e31
ikawrakow
iq3_s: make it work on metal for QK_K = 64
cbd950b2
sorasoras
Will this fix ROCm?
e1b8efb9
sorasoras
PeterReid
ikawrakow
JianbangZ
PeterReid
cebtenzzre
ikawrakow
sorasoras
Artefact2
PeterReid
ggerganov
ggerganov approved these changes on 2024-02-24
thorvaldur-arnar
ikawrakow ikawrakow merged 4c4cb307 into master 1 year ago
ikawrakow ikawrakow deleted the ik/iq3_xs_new2 branch 1 year ago
dranger003
ikawrakow
dranger003
mofosyne mofosyne added Review Complexity : High
mofosyne mofosyne added Tensor Encoding Scheme

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone