onnxruntime
[CPU] GQA supports head_sink input for smooth softmax
#25269
Merged

[CPU] GQA supports head_sink input for smooth softmax #25269

tianleiwu merged 11 commits into main from tlwu/gqa_head_sink
tianleiwu
tianleiwu update spec
1d84450c
tianleiwu tianleiwu marked this pull request as draft 173 days ago
tianleiwu Implement CPU
a1b51b74
github-actions
github-actions commented on 2025-07-05
tianleiwu merge main
4a9f45c4
tianleiwu add cpu test
e8b1490c
tianleiwu Merge branch 'main' into tlwu/gqa_head_sink
fdb8619f
tianleiwu reduce tests
fc6d1b3f
tianleiwu tianleiwu changed the title [WIP] GQA supports per head smooth softmax [CPU] GQA supports per head smooth softmax 167 days ago
tianleiwu lintrunner
489dba20
tianleiwu tianleiwu marked this pull request as ready for review 167 days ago
tianleiwu tianleiwu requested a review from fs-eire fs-eire 167 days ago
tianleiwu tianleiwu requested a review from kunal-vaishnavi kunal-vaishnavi 167 days ago
tianleiwu revert some files
1a1421d0
tianleiwu tianleiwu changed the title [CPU] GQA supports per head smooth softmax [CPU] GQA supports head_sink input for smooth softmax 167 days ago
tianleiwu revert cuda/rocm
d09d8049
tianleiwu update doc
19e3f55f
tianleiwu update cuda script
60de045c
fs-eire
fs-eire commented on 2025-07-09
nenad1002
nenad1002 approved these changes on 2025-07-09
tianleiwu tianleiwu merged cd5f91fe into main 167 days ago
tianleiwu tianleiwu deleted the tlwu/gqa_head_sink branch 167 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone