PR #5747 IQ4_XS: a 4.25 bpw quantization

IQ4_XS: a 4.25 bpw quantization #5747

ikawrakow merged 11 commits into master from ik/iq4_nl_xs

Try IQ4_NL with blocks of 64 - does not look good

67264b3b

iq4_xs: go to super-blocks of 256 and 6-bit scales for blocks of 32

2b21d37a

iq4_xs: CUDA works - 133.2 t/s

fddbfe83

iq4_xs: AVX2 dot product

061a16f5

iq4_xs: ARM_NEON dot product

a37980c3

iq4_nl: Metal implementation

ad40ae63

iq3_xs: minor fix

6c2b233b

iq4_xs: shrink by using IQ3_S for attn_k and attn_q

5c2b2305

iq4_xs: revert using IQ3_S for attn_k and attn_v

f162fcaf

Fix CI

801f998b

iq4_xs: Added forgotten check for 256 divisibility

d7bb4b6d

ggerganov approved these changes on 2024-02-27

ikawrakow merged 0becb22a into master 1 year ago

ikawrakow deleted the ik/iq4_nl_xs branch 1 year ago

mofosyne added Review Complexity : High

mofosyne added Tensor Encoding Scheme

Reviewers

ggerganov

Assignees

No one assigned

Labels

Review Complexity : High Tensor Encoding Scheme

Milestone

No milestone