llama.cpp
vulkan: optimize flash attention split_k_reduce
#14554
Merged

vulkan: optimize flash attention split_k_reduce #14554

0cc4m merged 2 commits into ggml-org:master from jeffbolznv:fa_split_k_opts
jeffbolznv
jeffbolznv vulkan: allow FA split_k with smaller KV values
314e0e61
jeffbolznv vulkan: spread split_k_reduce work across more threads
8f24cd9a
jeffbolznv jeffbolznv requested a review from 0cc4m 0cc4m 193 days ago
github-actions github-actions added Vulkan
github-actions github-actions added ggml
0cc4m
0cc4m approved these changes on 2025-07-08
0cc4m 0cc4m merged 6efcd659 into master 191 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone