onnxruntime
Add support for custom position ids and attention bias to GQA CPU operator
#23944
Merged

Add support for custom position ids and attention bias to GQA CPU operator #23944

derdeljan-msft merged 25 commits into main from derdeljan/gqa-tree-decoding
derdeljan-msft
derdeljan-msft Scalar support for custom position ids and mask in GQA
d7f5aa1a
derdeljan-msft Vectorized attention mask application for fp32
15172c3b
derdeljan-msft Vectorized attention mask application for fp16
d7eae786
derdeljan-msft Add mask upscale to fp32 if the platform doesn't support fp16
9d244dd7
derdeljan-msft Fix typo in fp16 eltwise kernels
8faee661
derdeljan-msft Add validation for custom attention parameters
147d19b6
derdeljan-msft Add mlas unit test for eltwise kernels
4b1262eb
derdeljan-msft Refactor python unit GQA tests
f7a07881
derdeljan-msft Cleanup comments
9dec0564
derdeljan-msft derdeljan-msft requested a review from tianleiwu tianleiwu 292 days ago
derdeljan-msft derdeljan-msft requested a review from jywu-msft jywu-msft 292 days ago
derdeljan-msft derdeljan-msft requested a review from fajin-corp fajin-corp 292 days ago
derdeljan-msft derdeljan-msft requested a review from aciddelgado aciddelgado 292 days ago
derdeljan-msft derdeljan-msft assigned derdeljan-msft derdeljan-msft 292 days ago
derdeljan-msft derdeljan-msft requested a review 292 days ago
derdeljan-msft derdeljan-msft changed the title Add support custom position ids and attention mask to GQA CPU operator Add support for custom position ids and attention mask to GQA CPU operator 292 days ago
github-advanced-security
github-advanced-security commented on 2025-03-07
jywu-msft jywu-msft requested a review from liqunfu liqunfu 292 days ago
kunal-vaishnavi
kunal-vaishnavi commented on 2025-03-07
kunal-vaishnavi
kunal-vaishnavi commented on 2025-03-07
kunal-vaishnavi
kunal-vaishnavi commented on 2025-03-07
kunal-vaishnavi
kunal-vaishnavi commented on 2025-03-07
aciddelgado
aciddelgado commented on 2025-03-07
derdeljan-msft
derdeljan-msft Fix CI pipeline errors
5d23817e
github-actions
github-actions commented on 2025-03-07
derdeljan-msft Apply suggestions from code review
42e83d68
derdeljan-msft Fix docs pipeline build
bc0d69b9
derdeljan-msft Fix docs pipeline build
ab60cbc6
fajin-corp
fajin-corp commented on 2025-03-10
fajin-corp
fajin-corp commented on 2025-03-10
fajin-corp
derdeljan-msft Fix first batch of PR comments
4e0ca5c2
derdeljan-msft Fix PR comments
949118f5
github-actions
github-actions commented on 2025-03-13
derdeljan-msft Linter fix
62d39a5a
tianleiwu
tianleiwu commented on 2025-03-13
derdeljan-msft Update attention_mask input description
0349678c
derdeljan-msft Fix build break
0865ddbf
derdeljan-msft Fix docs gen CI pipeline
55e09c9a
tianleiwu
tianleiwu commented on 2025-03-13
tianleiwu
tianleiwu commented on 2025-03-13
tianleiwu
tianleiwu commented on 2025-03-13
tianleiwu
tianleiwu commented on 2025-03-13
tianleiwu
tianleiwu commented on 2025-03-13
tianleiwu
tianleiwu commented on 2025-03-13
derdeljan-msft Apply attention mask after softcap
e3bc338c
derdeljan-msft Cleanup mlas eltwise module
757af32c
tianleiwu
tianleiwu commented on 2025-03-13
tianleiwu
tianleiwu commented on 2025-03-13
tianleiwu
tianleiwu commented on 2025-03-13
tianleiwu
tianleiwu commented on 2025-03-13
tianleiwu
tianleiwu commented on 2025-03-13
derdeljan-msft Fix PR comments
0c268c94
github-actions
github-actions commented on 2025-03-13
derdeljan-msft Fix position_ids handling for the first prompt
c36a9cfd
derdeljan-msft Fix build break
86a7737f
derdeljan-msft
derdeljan-msft derdeljan-msft requested a review from tianleiwu tianleiwu 285 days ago
derdeljan-msft derdeljan-msft requested a review from fajin-corp fajin-corp 285 days ago
derdeljan-msft derdeljan-msft changed the title Add support for custom position ids and attention mask to GQA CPU operator Add support for custom position ids and attention bias to GQA CPU operator 285 days ago
tianleiwu
tianleiwu commented on 2025-03-14
tianleiwu
tianleiwu commented on 2025-03-14
tianleiwu
derdeljan-msft Fix PR comments and fix docs gen CI pipeline
56fe7683
tianleiwu
tianleiwu approved these changes on 2025-03-14
fajin-corp
fajin-corp approved these changes on 2025-03-14
tianleiwu
derdeljan-msft
derdeljan-msft derdeljan-msft merged 5a694bcb into main 284 days ago
derdeljan-msft derdeljan-msft deleted the derdeljan/gqa-tree-decoding branch 284 days ago

Login to write a write a comment.

Login via GitHub