[webgpu] Flash attention for generation #23808
[webgpu] Add flash decoding
f0424fd1
qjia7
force pushed
from
6f6d6d15
to
f0424fd1
317 days ago
fix CI errors
48affd3b
qjia7
changed the title [WIP] Flash attention for generation [webgpu] Flash attention for generation 316 days ago
limit it to static kv cache
96aaa897
qjia7
marked this pull request as ready for review 316 days ago
Merge branch 'main' into attention_generate_fa_good
a97ad569
remove the limitations
40aa7ada
Merge branch 'main' into attention_generate_fa_good
99df2e9d
Use 1D dispatch group size
e9c18db9
qjia7
marked this pull request as draft 308 days ago
add annotations
c96e925d
qjia7
marked this pull request as ready for review 308 days ago
Use simialr var name with matmul
0fb5c2f0
Merge branch 'main' into attention_generate_fa_good
0d3a7381
update cache hints
7cbed5fa
address comments
2526992b
qjia7
commented
on 2025-03-27
address comments
922ca1b9
Merge branch 'main' into attention_generate_fa_good
51595805
Rename XXXSplitK to XXXSplitVxScore
2f4a1f79
Modify the comments
acbf5442
guschmue
dismissed these changes
on 2025-04-04
Merge branch 'main' into attention_generate_fa_good
639a2abd
address comments
191cf414
qjia7
dismissed their stale review
via 191cf414
289 days ago
qjia7
dismissed their stale review
via 191cf414
289 days ago
qjia7
commented
on 2025-04-07
sushanthr
approved these changes
on 2025-04-08
guschmue
approved these changes
on 2025-04-08
guschmue
merged
18f91e55
into main 288 days ago
guschmue
deleted the attention_generate_fa branch 288 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub