Metal (#6196)

Commit

2 years ago

ggml : same IQ4_NL quantization for CPU/CUDA/Metal (#6196) * Make quantize_row_iq4_nl do the same thing is quantization on CUDA * Make quantize_row_iq4_nl do the same thing is quantization on CUDA This time for real. backend-ops tests pass. * Now fix test-quantize-fns --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

References

#6196 - Make IQ4_NL quantization be the same on CPU/CUDA/Metal when quantizing K-cache

Author

ikawrakow

Parents

5b7b0ac8

llama.cpp cfd3be76 - ggml : same IQ4_NL quantization for CPU/CUDA/Metal (#6196)

llama.cpp
cfd3be76 - ggml : same IQ4_NL quantization for CPU/CUDA/Metal (#6196)