PR #5676 IQ3_S: a much better alternative to Q3_K

IQ3_S: a much better alternative to Q3_K #5676

ikawrakow merged 27 commits into master from ik/iq3_xs_new2

iq4_nl: squash commits for easier rebase

10a47fa6

Resurrecting iq3_xs

5691fecd

Minor PPL improvement via a block scale fudge factor

76aff093

Minor improvement via 3 neighbours

5be4e7ac

iq3_xs: working scalar and AVX2 dot products

f1255c50

iq3_xs: ARM_NEON dot product - works but extremely slow (10 t/s)

76214ab6

iq3_xs: working Metal implementation

38aa7b17

Adding IQ3_M - IQ3_XS mix with mostly Q4_K

2ec600b7

iiq3_xs: a 3.4375 bpw variant

d83fddaa

iq3_xs: make CUDA work for new version

eacff4aa

iq3_xs: make scalar and AVX2 work for new version

1fef4b8b

iq3_s: make ARM_NEON work with new version

1328331d

iq3_xs: make new version work on metal

17778255

iq3_xs: tiny Metal speed improvement

87038fe1

iq3_xs: tiny Metal speed improvement

4d5feebe

Fix stupid warning

b25f9960

Q3_K_XS now uses a mix of IQ3_XS and IQ3_XXS

272c7f77

iq3_xs: rename to iq3_s

2730225c

iq3_s: make tests pass

47cf30b0

Move Q3_K_XS mix to 3.25 bpw

cd6a0f08

Attempt to fix failing tests

436a146f

Another attempt to fix the Windows builds

303f3f32

Attempt to fix ROCm

0d6d185e

Artefact2 commented on 2024-02-23

ROCm again

1d47de32

iq3_s: partial fix for QK_K = 64

e6e61e31

iq3_s: make it work on metal for QK_K = 64

cbd950b2

Will this fix ROCm?

e1b8efb9

ggerganov approved these changes on 2024-02-24

ikawrakow merged 4c4cb307 into master 2 years ago

ikawrakow deleted the ik/iq3_xs_new2 branch 2 years ago

mofosyne added Review Complexity : High

mofosyne added Tensor Encoding Scheme

Reviewers

ggerganov

Artefact2

Assignees

No one assigned

Labels

Review Complexity : High Tensor Encoding Scheme

Milestone

No milestone

llama.cpp IQ3_S: a much better alternative to Q3_K #5676 Merged

IQ3_S: a much better alternative to Q3_K #5676

llama.cpp
IQ3_S: a much better alternative to Q3_K
#5676

Merged