llama.cpp
Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache
#6183

Merged

Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache #6183

ikawrakow merged 7 commits into master from ik/k_cache_q5

k_cache: be able to use Q5_0

5e09ce41

k_cache: be able to use Q5_1 on CODA

5d8822b0

k_cache: be able to use Q5_0 on Metal

fef4a23e

k_cache: be able to use Q5_1 on Metal

d68030b8

k_cache: be able to use IQ4_NL - just CUDA for now

9711e1ee

k_cache: be able to use IQ4_NL on Metal

d8a498dc

ggerganov approved these changes on 2024-03-20

ggerganov requested a review from

slaren 1 year ago

slaren commented on 2024-03-20

slaren approved these changes on 2024-03-20

slaren commented on 2024-03-20

k_cache: add newly added supported types to llama-bench and CUDA supp…

9e1bda93

ikawrakow merged 76aa30a2 into master 1 year ago

ikawrakow deleted the ik/k_cache_q5 branch 1 year ago

Reviewers

slaren

ggerganov

Assignees

No one assigned

Labels

None yet

Milestone

No milestone