llama.cpp
Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache
#6183
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
7
Changes
View On
GitHub
Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache
#6183
ikawrakow
merged 7 commits into
master
from
ik/k_cache_q5
k_cache: be able to use Q5_0
5e09ce41
k_cache: be able to use Q5_1 on CODA
5d8822b0
k_cache: be able to use Q5_0 on Metal
fef4a23e
k_cache: be able to use Q5_1 on Metal
d68030b8
k_cache: be able to use IQ4_NL - just CUDA for now
9711e1ee
k_cache: be able to use IQ4_NL on Metal
d8a498dc
ggerganov
approved these changes on 2024-03-20
ggerganov
requested a review
from
slaren
1 year ago
slaren
commented on 2024-03-20
slaren
approved these changes on 2024-03-20
slaren
commented on 2024-03-20
k_cache: add newly added supported types to llama-bench and CUDA supp…
9e1bda93
ikawrakow
merged
76aa30a2
into master
1 year ago
ikawrakow
deleted the ik/k_cache_q5 branch
1 year ago
Login to write a write a comment.
Login via GitHub
Reviewers
slaren
ggerganov
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub