llama.cpp
sycl : port multi-column MMVQ from CUDA backend (~45% speculative decoding speedup on Intel Arc)
#21845

Merged

sycl : port multi-column MMVQ from CUDA backend (~45% speculative decoding speedup on Intel Arc) #21845

ggerganov merged 1 commit into ggml-org:master from masonmilby:sycl-mmvq-multicol

masonmilby requested a review 85 days ago

github-actions added ggml

github-actions added SYCL

masonmilby changed the title ~~sycl : port multi-column MMVQ from CUDA backend (~75% speculative decoding speedup on Intel Arc)~~ sycl : port multi-column MMVQ from CUDA backend (~45% speculative decoding speedup on Intel Arc) 82 days ago

arthw commented on 2026-05-09

masonmilby marked this pull request as draft 59 days ago

sycl : port multi-column MMVQ from CUDA backend

113d79e3

masonmilby force pushed from d5ca0928 to 113d79e3 33 days ago

masonmilby marked this pull request as ready for review 32 days ago

masonmilby requested a review from

arthw 32 days ago

arthw approved these changes on 2026-06-05

arthw added merge ready

ggerganov merged 7fe2ae45 into master 32 days ago

Reviewers

arthw

Assignees

No one assigned

Labels

ggml merge ready SYCL

Milestone

No milestone

llama.cpp sycl : port multi-column MMVQ from CUDA backend (~45% speculative decoding speedup on Intel Arc) #21845 Merged

sycl : port multi-column MMVQ from CUDA backend (~45% speculative decoding speedup on Intel Arc) #21845

llama.cpp
sycl : port multi-column MMVQ from CUDA backend (~45% speculative decoding speedup on Intel Arc)
#21845

Merged