llama.cpp
f01bd023 - vulkan: Implement split_k for coopmat2 flash attention. (#12627)

Commit
257 days ago
vulkan: Implement split_k for coopmat2 flash attention. (#12627) When using group query attention, we have one workgroup per KV batch and this can be very few workgroups (e.g. just 8 in some models). Enable split_k to spread the work across SMs. This helps a lot when the KV cache is large.
Author
Parents
Loading