llama.cpp
2d5db483 - ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508)

Commit

2 years ago

ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508) * ggml : use F16 instead of F32 in Q4_0, Q4_1 and Q8_0 * llama : bump LLAMA_FILE_VERSION to 3 * cuda : update Q4 and Q8 dequantize kernels * ggml : fix AVX dot products * readme : update performance table + hot topics

References

#1508 - ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0

Author

ggerganov

Parents

6986c783

llama.cpp 2d5db483 - ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508)

llama.cpp
2d5db483 - ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508)