[CUDA] Support head_sink in flash attention for GQA #25432
support head sink in flash attention for GQA
ed29822a
update comments
8a8ee9f4
remove unused script
cef06642
fix build
1cf1aa78
tianleiwu
merged
e6c84b80
into main 248 days ago
tianleiwu
deleted the tlwu/gqa_head_sink_cuda branch 248 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub