llama.cpp
[CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full
#19042

Merged

[CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full #19042

ggerganov merged 4 commits into ggml-org:master from gaugarg-nv:pp_perf_improve

[CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full

d3298dc3

Set the env variable in the CUDA backend registry allocation

29c73efe

github-actions added Nvidia GPU

github-actions added ggml

Add link to PR in code comment

14de97eb

JohannesGaessler commented on 2026-01-24

Remove warning logs and update documentation

ed2e4840

github-actions added documentation

JohannesGaessler approved these changes on 2026-01-26

ggerganov approved these changes on 2026-01-26

ggerganov merged a83c73a1 into master 143 days ago

gaugarg-nv deleted the pp_perf_improve branch 59 days ago

Reviewers

ggerganov

JohannesGaessler

Assignees

No one assigned

Labels

documentation Nvidia GPU ggml

Milestone

No milestone