llama.cpp
dc020985 - Avoid unnecessarily disabling CUDA graphs (#7302)

Commit
1 year ago
Avoid unnecessarily disabling CUDA graphs (#7302) As discussed in PR #6766, CUDA graphs were being disabled in the presence of long prompts. This fixes the issue by avoiding the consective update counter from incrementing unnecessarily for tokens in which cuda graphs are disabled due to batch size > 1.
Author
Parents
Loading