llama.cpp
sycl : port multi-column MMVQ from CUDA backend (~45% speculative decoding speedup on Intel Arc)
#21845
Merged

Loading