llama.cpp
Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache
#6183
Merged

Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache #6183

ikawrakow merged 7 commits into master from ik/k_cache_q5
ikawrakow
k_cache: be able to use Q5_0
5e09ce41
k_cache: be able to use Q5_1 on CODA
5d8822b0
k_cache: be able to use Q5_0 on Metal
fef4a23e
k_cache: be able to use Q5_1 on Metal
d68030b8
k_cache: be able to use IQ4_NL - just CUDA for now
9711e1ee
k_cache: be able to use IQ4_NL on Metal
d8a498dc
ggerganov
ggerganov approved these changes on 2024-03-20
ggerganov ggerganov requested a review from slaren slaren 1 year ago
slaren
slaren commented on 2024-03-20
slaren
slaren approved these changes on 2024-03-20
slaren
slaren commented on 2024-03-20
k_cache: add newly added supported types to llama-bench and CUDA supp…
9e1bda93
sorasoras
ikawrakow
ikawrakow ikawrakow merged 76aa30a2 into master 1 year ago
ikawrakow ikawrakow deleted the ik/k_cache_q5 branch 1 year ago
ggerganov
ikawrakow
ggerganov
ikawrakow
ggerganov
Artefact2
ikawrakow
Artefact2

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone