Optimize FlashAttention for M4 Max (20x speedup) #27780
Optimize FlashAttention for M4 Max
78d8ccb7
xenova
marked this pull request as draft 60 days ago
fixes
1fa33c91
xenova
changed the title Optimize FlashAttention for M4 Max (10.8x speedup) Optimize FlashAttention for M4 Max (12x speedup) 59 days ago
more optimizations
864561ab
xenova
changed the title Optimize FlashAttention for M4 Max (12x speedup) Optimize FlashAttention for M4 Max (20x speedup) 59 days ago
xenova
marked this pull request as ready for review 59 days ago
qjia7
commented
on 2026-03-20
Address comment for lower-end devices
64875156
remove unused is_nvidia parameter
609a9b78
keep original implementation for qualcomm
3b3315ed
use original approach for everything non-apple
6b75d1fe
cleanup
eaeeea42
keep diff small
8a99f0f8
manually update indentation
2f2ff1a7
add back comment
a3cb6b77
cleanup
90d2f414
qjia7
commented
on 2026-03-24
cap max_k_step_ to 32 on apple hardware
4334a23b
guschmue
approved these changes
on 2026-05-14
guschmue
merged
938b6075
into main 5 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub