llama.cpp
dd0eabc0 - ggml : use full range for Q4_0 and Q4_2 quantization (#729)

Commit

3 years ago

ggml : use full range for Q4_0 and Q4_2 quantization (#729) * Use full range for q4_0 quantization By keeping the sign of the highest magnitude, we can make sure the highest value maps to -8, which is currently unused. This is a bit of a freebie since it is fully backwards compatible with the current format. * Update quantize_row_q4_0 for AVX/AVX2 * Update quantize_row_q4_0 for WASM Untested * Update quantize_row_q4_0 for Arm NEON * Update quantize_row_q4_0 for PowerPC Untested * Use full range for q4_2 quantization

References

#729 - Use full range for q4_0 quantization

Author

unbounded

Parents

54bb60e2

llama.cpp dd0eabc0 - ggml : use full range for Q4_0 and Q4_2 quantization (#729)

llama.cpp
dd0eabc0 - ggml : use full range for Q4_0 and Q4_2 quantization (#729)