llama.cpp
Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache
#6183
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
7
Changes
View On
GitHub
Commits
k_cache: be able to use Q5_0
Iwan Kawrakow
committed
2 years ago
k_cache: be able to use Q5_1 on CODA
Iwan Kawrakow
committed
2 years ago
k_cache: be able to use Q5_0 on Metal
Iwan Kawrakow
committed
2 years ago
k_cache: be able to use Q5_1 on Metal
Iwan Kawrakow
committed
2 years ago
k_cache: be able to use IQ4_NL - just CUDA for now
Iwan Kawrakow
committed
2 years ago
k_cache: be able to use IQ4_NL on Metal
Iwan Kawrakow
committed
2 years ago
k_cache: add newly added supported types to llama-bench and CUDA supports_op
Iwan Kawrakow
committed
2 years ago
Loading