llama.cpp
955ef9a5 - ggml : alternative Q4_3 implementation using modified Q8_0 (#1109)

Commit

2 years ago

ggml : alternative Q4_3 implementation using modified Q8_0 (#1109) * ggml : prefer vzip to vuzp This way we always use the same type of instruction across all quantizations * ggml : alternative Q4_3 implementation using modified Q8_0 * ggml : fix Q4_3 scalar imlpementation * ggml : slight improvement of Q4_3 - no need for loop unrolling * ggml : fix AVX paths for Q8_0 quantization

References

#1109 - ggml : alternative Q4_3 implementation using modified Q8_0

Author

ggerganov

Parents

c5aa5e57

llama.cpp 955ef9a5 - ggml : alternative Q4_3 implementation using modified Q8_0 (#1109)

llama.cpp
955ef9a5 - ggml : alternative Q4_3 implementation using modified Q8_0 (#1109)