onnxruntime
Implement Flash Attention 2 for webgpu EP
#23576
Merged

Implement Flash Attention 2 for webgpu EP #23576

guschmue merged 11 commits into main from user/sushraja/flash_attention2
sushraja-msft
sushraja-msft sushraja-msft requested a review from guschmue guschmue 1 year ago
guschmue guschmue added ep:WebGPU
guschmue
guschmue dismissed these changes on 2025-02-05
sushraja-msft Port over FA
ed066e1f
sushraja-msft Attempt FA2
6b978cbd
sushraja-msft attempt to fix k-index
655c8b61
sushraja-msft This FA works
3df32ed2
sushraja-msft Add comments
71c8d59d
sushraja-msft Support all sg_size and restrict FA to prefill only. On ADL, WU drive…
05b0f250
sushraja-msft lint runner
362e969d
sushraja-msft sushraja-msft force pushed from dc9d7527 to 362e969d 364 days ago
sushraja-msft Remove components
4ab90c31
sushraja-msft sushraja-msft dismissed their stale review via 4ab90c31 364 days ago
sushraja-msft remove half float notation from constants
635cd219
sushraja-msft Fix Attention bias.
f289c644
sushraja-msft exclude fa on devices without subgroups
a5fd8a67
yuslepukhin
sushraja-msft
guschmue
guschmue approved these changes on 2025-02-07
guschmue guschmue merged 82840f63 into main 363 days ago
guschmue guschmue deleted the user/sushraja/flash_attention2 branch 363 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone