llama.cpp
7c4263d4 - ggml : make i-quants work with super-blocks of 64 (CPU,Metal) (#5760)

Commit

1 year ago

ggml : make i-quants work with super-blocks of 64 (CPU,Metal) (#5760) * WIP: make i-quants work for QK_K = 64 * iq2_xs: attempt to fix AVX dot product for QK_K = 64 Tests pass, but I get gibberish. * QK_K = 64 tests pass on ARM_NEON and Metal Sadly, that does not mean it actually works. * Make CUDA compile with QK_K = 64 Tests don't pass, plus we get misaligned access * Q2_K: fixed bug in imatrix quantization for QK_K = 64 * iq1_s: turn off SIMD implementation for QK_K = 64 (it does not work) --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

References

#5760 - Make i-quants work with super-blocks of 64 (CPU and Metal)

Author

ikawrakow

Parents

cb49e0f8

llama.cpp 7c4263d4 - ggml : make i-quants work with super-blocks of 64 (CPU,Metal) (#5760)

llama.cpp
7c4263d4 - ggml : make i-quants work with super-blocks of 64 (CPU,Metal) (#5760)