llama.cpp
Vulkan: Tune Flash Attention for MoE on AMD GPUs
#18280

Merged

Vulkan: Tune Flash Attention for MoE on AMD GPUs #18280

0cc4m merged 1 commit into master from 0cc4m/vulkan-flash-attention-tuning

0cc4m requested a review from

jeffbolznv 7 days ago

github-actions added Vulkan

github-actions added ggml

vulkan: use fewer FA rows for small cache runs

c9b4b5ea

0cc4m force pushed from 8e9ebae7 to c9b4b5ea 7 days ago

jeffbolznv approved these changes on 2025-12-22

0cc4m merged 7f459c98 into master 5 days ago

0cc4m deleted the 0cc4m/vulkan-flash-attention-tuning branch 5 days ago

Reviewers

jeffbolznv

Assignees

No one assigned

Labels

Vulkan ggml

Milestone

No milestone