onnxruntime
[CUDA] Support head_sink in flash attention for GQA
#25432

Merged

[CUDA] Support head_sink in flash attention for GQA #25432

tianleiwu merged 4 commits into main from tlwu/gqa_head_sink_cuda

support head sink in flash attention for GQA

ed29822a

github-advanced-security commented on 2025-07-17

update comments

8a8ee9f4

remove unused script

cef06642

fix build

1cf1aa78

kunal-vaishnavi commented on 2025-07-17

kunal-vaishnavi approved these changes on 2025-07-17

tianleiwu merged e6c84b80 into main 248 days ago

tianleiwu deleted the tlwu/gqa_head_sink_cuda branch 248 days ago

Reviewers

kunal-vaishnavi

Assignees

No one assigned

Labels

None yet

Milestone

No milestone