llama.cpp
Simplify and improve CUDA graphs through use of indirect copy pointers
#9017

Merged

Simplify and improve CUDA graphs through use of indirect copy pointers #9017

slaren merged 9 commits into ggml-org:master from agray3:ag_indirect_copy_dest

github-actions added Nvidia GPU

github-actions added ggml

agray3 marked this pull request as draft 1 year ago

CUDA: Simplify and improve CUDA graphs through use of indirect copy p…

e9a1be0a

agray3 force pushed from 38f4863a to e9a1be0a 276 days ago

agray3 marked this pull request as ready for review 262 days ago

slaren commented on 2025-03-25

Addressed comments

1a2441ad

IMbackK approved these changes on 2025-03-26

slaren requested changes on 2025-03-29

fix HIP builds

a3d13183

properly sync to stream

6d7df919

removed ggml_cuda_cpy_fn_ptrs

04a73070

move stream sync before free

c255a0fd

guard to only use indirection with graphs

21fae96d

style fixes

61622c0e

slaren approved these changes on 2025-04-01

check for errors

fd88d2b1

slaren merged 3f9da22c into master 254 days ago

Reviewers

slaren

IMbackK

Assignees

No one assigned

Labels

Nvidia GPU ggml

Milestone

No milestone