llama.cpp
Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache
#6183

Merged

Commits

k_cache: be able to use Q5_0

Iwan Kawrakow committed 2 years ago
k_cache: be able to use Q5_1 on CODA

Iwan Kawrakow committed 2 years ago
k_cache: be able to use Q5_0 on Metal

Iwan Kawrakow committed 2 years ago
k_cache: be able to use Q5_1 on Metal

Iwan Kawrakow committed 2 years ago
k_cache: be able to use IQ4_NL - just CUDA for now

Iwan Kawrakow committed 2 years ago
k_cache: be able to use IQ4_NL on Metal

Iwan Kawrakow committed 2 years ago
k_cache: add newly added supported types to llama-bench and CUDA supports_op

Iwan Kawrakow committed 2 years ago