llama.cpp
76aa30a2 - Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183)

Commit

2 years ago

Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183) * k_cache: be able to use Q5_0 * k_cache: be able to use Q5_1 on CODA * k_cache: be able to use Q5_0 on Metal * k_cache: be able to use Q5_1 on Metal * k_cache: be able to use IQ4_NL - just CUDA for now * k_cache: be able to use IQ4_NL on Metal * k_cache: add newly added supported types to llama-bench and CUDA supports_op --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

References

#6183 - Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache

Author

ikawrakow

Parents

c5b8595e

llama.cpp 76aa30a2 - Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183)

llama.cpp
76aa30a2 - Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183)