llama.cpp
CUDA: Improve performance via less synchronizations between token
#17795
Merged

CUDA: Improve performance via less synchronizations between token #17795

aendk
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
wishstudio
aendk
wishstudio
ggerganov
JohannesGaessler
JohannesGaessler commented on 2025-12-08
JohannesGaessler
ggerganov
aendk aendk force pushed from f6b408d8 to c20e7b4b 173 days ago
aendk aendk force pushed from 5d26313d to 1233fdda 172 days ago
aendk
aendk aendk marked this pull request as ready for review 172 days ago
aendk aendk changed the title [DRAFT] CUDA: Improve performance via less synchronizations between token CUDA: Improve performance via less synchronizations between token 172 days ago
jeffbolznv
aendk
aendk aendk marked this pull request as draft 171 days ago
aendk
aendk
aendk aendk force pushed from 1233fdda to 402502f4 147 days ago
aendk aendk force pushed from 402502f4 to 459cb40f 147 days ago
aendk
aendk aendk marked this pull request as ready for review 147 days ago
ggerganov
jeffbolznv
jeffbolznv commented on 2026-01-14
aendk
ggerganov
ggerganov commented on 2026-01-15
aendk
aendk aendk force pushed from d9416246 to 50344142 140 days ago
aendk
ggerganov
ggerganov commented on 2026-01-22
JohannesGaessler
JohannesGaessler
aendk Adds CPU-to-CUDA copy capability to
d5e3b24a
aendk Adds function to relax sync requirements between input copies on
76deb1f1
aendk Exchanges synchronous copy with async copy function.
2b4b80d0
aendk Adds macro guards to allow compilation in non-CUDA builds
d8cebf68
aendk Reworked backend detection in ggml-backend.cpp to avoid linking
51e8cc79
aendk Relax requirement of checks in async CUDA copies from backend and buf…
e51e58fe
aendk Minor cleanup
1419050d
aendk Makes opt-in to relax use of explicit syncs more general. Backends like
353daedd
aendk Reintroduces stricter check for CPU->CUDA backend async copy via
34cf1027
aendk Corrects initialization of ggml_backend_sync_mode in
c3998dc0
aendk Simplifies synchronizations to adhere to `saaasg` pattern.
d58d96b7
aendk Apply suggestion from @ggerganov (src->buffer to buf_src)
887c2fb0
aendk Apply suggestion from @ggerganov (src->buffer to buf_src) v2
81c96185
aendk aendk force pushed from 3ecffb2a to 81c96185 120 days ago
ggerganov
JohannesGaessler
jeffbolznv
ggerganov
ggerganov approved these changes on 2026-03-05
ggerganov ggerganov merged 2cd20b72 into master 96 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone