llama.cpp
7c5bfd57 - Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. (#8943)

Commit

1 year ago

Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. (#8943) * Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. - Allocation overhead for the temporary std::vectors was easily detectable with a sampling profiler and simple to remove. - ggml_vk_sync_buffer introduce a full pipeline sync which has a significant cost on the GPU side, sometimes larger than the actual kernel execution. Adding only barriers for shader read/writes and transfers seems to be sufficient looking at the code which either launches compute kernels or copies tensors. * Fix small typo --------- Co-authored-by: 0cc4m <picard12@live.de>

References

#8943 - Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead.

Author

mtavenrath

Parents

6e02327e

llama.cpp 7c5bfd57 - Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. (#8943)

llama.cpp
7c5bfd57 - Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. (#8943)