llama.cpp
≈65% speedup of the AVX-512 implementation of `ggml_vec_dot_q4_0()`
#933
Merged

Loading