[webgpu] Enable FlashAttention for GQA #23761
update the copyKVCache to support GQA
82dc919b
update falshAttention make it work with gqa
abbbac61
fix bugs in copyKVCache
20d0fa93
use valid_present_shape to reduce some useless dispatches
6bd23817
fix the accuracy issue
0bba2e62
fix lint error
860b3c03
address comments
af09013a
fix errors when kv_num_heads is smaller than num_heads
7c56409c
nits
0e42fa32
Merge branch 'main' into copyKVCache
055ff8f8
fix lintruner errors
7189818b
Merge branch 'main' into copyKVCache
d393a999
guschmue
approved these changes
on 2025-02-22
guschmue
merged
9799c3fb
into main 1 year ago
guschmue
deleted the copyKVCache branch 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub