llama.cpp
Vulkan: Tune Flash Attention for MoE on AMD GPUs
#18280
Merged

Vulkan: Tune Flash Attention for MoE on AMD GPUs #18280

0cc4m
0cc4m 0cc4m requested a review from jeffbolznv jeffbolznv 7 days ago
github-actions github-actions added Vulkan
github-actions github-actions added ggml
0cc4m vulkan: use fewer FA rows for small cache runs
c9b4b5ea
0cc4m 0cc4m force pushed from 8e9ebae7 to c9b4b5ea 7 days ago
jeffbolznv
jeffbolznv
jeffbolznv approved these changes on 2025-12-22
0cc4m 0cc4m merged 7f459c98 into master 5 days ago
0cc4m 0cc4m deleted the 0cc4m/vulkan-flash-attention-tuning branch 5 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone