onnxruntime
[CUDA] Support head_sink in flash attention for GQA
#25432
Merged

[CUDA] Support head_sink in flash attention for GQA #25432

tianleiwu merged 4 commits into main from tlwu/gqa_head_sink_cuda
tianleiwu
tianleiwu support head sink in flash attention for GQA
ed29822a
github-advanced-security
github-advanced-security commented on 2025-07-17
tianleiwu update comments
8a8ee9f4
tianleiwu remove unused script
cef06642
tianleiwu fix build
1cf1aa78
kunal-vaishnavi
kunal-vaishnavi commented on 2025-07-17
kunal-vaishnavi
kunal-vaishnavi commented on 2025-07-17
kunal-vaishnavi
kunal-vaishnavi approved these changes on 2025-07-17
tianleiwu tianleiwu merged e6c84b80 into main 248 days ago
tianleiwu tianleiwu deleted the tlwu/gqa_head_sink_cuda branch 248 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone