onnxruntime
Optimize FlashAttention for M4 Max (20x speedup)
#27780
Merged

Optimize FlashAttention for M4 Max (20x speedup) #27780

guschmue merged 13 commits into microsoft:main from xenova:mha-optimizations
xenova
xenova Optimize FlashAttention for M4 Max
78d8ccb7
xenova xenova marked this pull request as draft 60 days ago
xenova fixes
1fa33c91
xenova xenova changed the title Optimize FlashAttention for M4 Max (10.8x speedup) Optimize FlashAttention for M4 Max (12x speedup) 59 days ago
xenova more optimizations
864561ab
xenova xenova changed the title Optimize FlashAttention for M4 Max (12x speedup) Optimize FlashAttention for M4 Max (20x speedup) 59 days ago
xenova xenova marked this pull request as ready for review 59 days ago
xenova
qjia7
qjia7
qjia7 commented on 2026-03-20
xenova
xenova Address comment for lower-end devices
64875156
xenova
xenova remove unused is_nvidia parameter
609a9b78
xenova
xenova keep original implementation for qualcomm
3b3315ed
guschmue
guschmue guschmue added ep:WebGPU
guschmue
xenova
xenova use original approach for everything non-apple
6b75d1fe
xenova
xenova cleanup
eaeeea42
xenova keep diff small
8a99f0f8
xenova manually update indentation
2f2ff1a7
xenova add back comment
a3cb6b77
xenova cleanup
90d2f414
qjia7
qjia7 commented on 2026-03-24
xenova cap max_k_step_ to 32 on apple hardware
4334a23b
guschmue
azure-pipelines
sroussey
xenova
kokroo
xenova
xenova
qjia7
sroussey
xenova
qjia7
tianleiwu tianleiwu requested a review from qjia7 qjia7 26 days ago
xenova
guschmue
guschmue
guschmue approved these changes on 2026-05-14
guschmue
guschmue guschmue merged 938b6075 into main 5 days ago
xenova

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone