CUDA: Improve performance via less synchronizations between token #17795
aendk
force pushed
from
f6b408d8
to
c20e7b4b
173 days ago
aendk
force pushed
from
5d26313d
to
1233fdda
172 days ago
aendk
marked this pull request as ready for review 172 days ago
aendk
changed the title [DRAFT] CUDA: Improve performance via less synchronizations between token CUDA: Improve performance via less synchronizations between token 172 days ago
aendk
marked this pull request as draft 171 days ago
aendk
force pushed
from
1233fdda
to
402502f4
147 days ago
aendk
force pushed
from
402502f4
to
459cb40f
147 days ago
aendk
marked this pull request as ready for review 147 days ago
aendk
force pushed
from
d9416246
to
50344142
140 days ago
Adds CPU-to-CUDA copy capability to
d5e3b24a
Adds function to relax sync requirements between input copies on
76deb1f1
Exchanges synchronous copy with async copy function.
2b4b80d0
Adds macro guards to allow compilation in non-CUDA builds
d8cebf68
Reworked backend detection in ggml-backend.cpp to avoid linking
51e8cc79
Relax requirement of checks in async CUDA copies from backend and buf…
e51e58fe
Minor cleanup
1419050d
Makes opt-in to relax use of explicit syncs more general. Backends like
353daedd
Reintroduces stricter check for CPU->CUDA backend async copy via
34cf1027
Corrects initialization of ggml_backend_sync_mode in
c3998dc0
Simplifies synchronizations to adhere to `saaasg` pattern.
d58d96b7
Apply suggestion from @ggerganov (src->buffer to buf_src)
887c2fb0
Apply suggestion from @ggerganov (src->buffer to buf_src) v2
81c96185
aendk
force pushed
from
3ecffb2a
to
81c96185
120 days ago
ggerganov
approved these changes
on 2026-03-05
ggerganov
merged
2cd20b72
into master 96 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub