llama.cpp
Make i-quants work with super-blocks of 64 (CPU and Metal)
#5760

Merged

Make i-quants work with super-blocks of 64 (CPU and Metal) #5760

ggerganov merged 6 commits into master from ik/i-quants-64

WIP: make i-quants work for QK_K = 64

13ba37f1

iq2_xs: attempt to fix AVX dot product for QK_K = 64

28e6146c

QK_K = 64 tests pass on ARM_NEON and Metal

de64e061

Make CUDA compile with QK_K = 64

2540a290

Q2_K: fixed bug in imatrix quantization for QK_K = 64

47d52b2b

iq1_s: turn off SIMD implementation for QK_K = 64 (it does not work)

f0cbb6dd

ggerganov approved these changes on 2024-02-28

ggerganov merged 7c4263d4 into master 2 years ago

Reviewers

ggerganov

Assignees

No one assigned

Labels

None yet

Milestone

No milestone