onnxruntime
[webgpu] Enable FlashAttention for GQA
#23761
Merged

[webgpu] Enable FlashAttention for GQA #23761

guschmue merged 12 commits into main from copyKVCache
qjia7
qjia7 update the copyKVCache to support GQA
82dc919b
qjia7 update falshAttention make it work with gqa
abbbac61
qjia7 fix bugs in copyKVCache
20d0fa93
qjia7 qjia7 requested a review from sushraja-msft sushraja-msft 1 year ago
qjia7 qjia7 requested a review from guschmue guschmue 1 year ago
qjia7 use valid_present_shape to reduce some useless dispatches
6bd23817
github-actions
github-actions commented on 2025-02-20
qjia7 fix the accuracy issue
0bba2e62
qjia7 fix lint error
860b3c03
guschmue guschmue added ep:WebGPU
sushraja-msft
sushraja-msft commented on 2025-02-20
qjia7 address comments
af09013a
qjia7 fix errors when kv_num_heads is smaller than num_heads
7c56409c
qjia7 nits
0e42fa32
qjia7 Merge branch 'main' into copyKVCache
055ff8f8
github-actions
github-actions commented on 2025-02-21
qjia7 fix lintruner errors
7189818b
qjia7 qjia7 requested a review from sushraja-msft sushraja-msft 1 year ago
sushraja-msft
sushraja-msft approved these changes on 2025-02-22
qjia7 Merge branch 'main' into copyKVCache
d393a999
guschmue
guschmue approved these changes on 2025-02-22
guschmue guschmue merged 9799c3fb into main 1 year ago
guschmue guschmue deleted the copyKVCache branch 1 year ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone