llama.cpp
Introduction of CUDA Graphs to LLama.cpp
#6766
Merged

Introduction of CUDA Graphs to LLama.cpp #6766

slaren merged 20 commits into ggml-org:master from agray3:ag_cuda_graphs
agray3
agray3 DRAFT: Introduction of CUDA Graphs to LLama.cpp
cec409aa
agray3
phymbert phymbert added need feedback
phymbert phymbert added performance
phymbert
JohannesGaessler
agray3
ardfork
sorasoras
Engininja2
agray3 FIx issues raised in comments
c8dd0e7c
agray3
agray3
agray3
agray3 Tidied to now only use CUDA runtime (not mixed with driver calls)
800f4fe4
sorasoras
sorasoras
agray3
sorasoras
ardfork
JohannesGaessler
agray3 disable for multi-gpu and batch size > 1
c2691d96
sorasoras
agray3
sorasoras
agray3 Disable CUDA graphs for old GPU arch and with env var
df4719ec
agray3
jdecourval
JohannesGaessler
agray3 added missing CUDA_CHECKs
c3d4ead1
agray3
JohannesGaessler
JohannesGaessler commented on 2024-04-24
agray3 Addressed comments
d403b180
agray3 agray3 changed the title DRAFT: Introduction of CUDA Graphs to LLama.cpp Introduction of CUDA Graphs to LLama.cpp 1 year ago
agray3 further addressed comments
40875968
slaren
ggerganov
agray3 limit to GGML_ALLOW_CUDA_GRAPHS defined in llama.cpp cmake
0640427f
agray3
slaren
agray3
JohannesGaessler
agray3
agray3 Merge branch 'ggerganov:master' into ag_cuda_graphs
9c578616
github-actions
slaren
agray3 Added more comprehensive graph node checking
d44e0fb2
agray3
agray3 With mechanism to fall back if graph capture fails
eb9f15fb
agray3
agray3 Revert "With mechanism to fall back if graph capture fails"
909e4c66
agray3
slaren
slaren
agray3 Fall back if graph capture fails and address other comments
58199503
agray3
agray3 Merge branch 'ggerganov:master' into ag_cuda_graphs
44af0964
slaren
slaren Merge remote-tracking branch 'origin/master' into ag_cuda_graphs
4e1f2a03
slaren - renamed GGML_ALLOW_CUDA_GRAPHS to GGML_CUDA_USE_GRAPHS
e830949e
slaren
slaren fix build without cuda graphs
a4c9b901
agray3
ggerganov
ggerganov commented on 2024-05-08
slaren remove outdated comment
ab40e667
slaren
slaren replace minimum cc value with a constant
f42312e0
slaren
slaren approved these changes on 2024-05-08
JohannesGaessler
JohannesGaessler commented on 2024-05-08
agray3
JohannesGaessler
JohannesGaessler approved these changes on 2024-05-08
ggerganov
slaren
slaren slaren merged bc4bba36 into master 1 year ago
fat-tire
JohannesGaessler
mofosyne mofosyne added Review Complexity : High
slaren
JohannesGaessler
slaren
agray3
JohannesGaessler
agray3
agray3
slaren
agray3

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone