PR #6766 Introduction of CUDA Graphs to LLama.cpp

DRAFT: Introduction of CUDA Graphs to LLama.cpp

agray3 committed 1 year ago

FIx issues raised in comments

agray3 committed 1 year ago

Tidied to now only use CUDA runtime (not mixed with driver calls)

agray3 committed 1 year ago

disable for multi-gpu and batch size > 1

agray3 committed 1 year ago

Disable CUDA graphs for old GPU arch and with env var

agray3 committed 1 year ago

added missing CUDA_CHECKs

agray3 committed 1 year ago

Addressed comments

agray3 committed 1 year ago

further addressed comments

agray3 committed 1 year ago

limit to GGML_ALLOW_CUDA_GRAPHS defined in llama.cpp cmake

agray3 committed 1 year ago

Merge branch 'ggerganov:master' into ag_cuda_graphs

agray3 committed 1 year ago

Added more comprehensive graph node checking

agray3 committed 1 year ago

With mechanism to fall back if graph capture fails

agray3 committed 1 year ago

Revert "With mechanism to fall back if graph capture fails"

agray3 committed 1 year ago

Fall back if graph capture fails and address other comments

agray3 committed 1 year ago

Merge branch 'ggerganov:master' into ag_cuda_graphs

agray3 committed 1 year ago

Merge remote-tracking branch 'origin/master' into ag_cuda_graphs

slaren committed 1 year ago

- renamed GGML_ALLOW_CUDA_GRAPHS to GGML_CUDA_USE_GRAPHS

slaren committed 1 year ago

fix build without cuda graphs

slaren committed 1 year ago

remove outdated comment

slaren committed 1 year ago

replace minimum cc value with a constant

slaren committed 1 year ago

llama.cpp Introduction of CUDA Graphs to LLama.cpp #6766 Merged