Add support for custom position ids and attention bias to GQA CPU operator #23944
Scalar support for custom position ids and mask in GQA
d7f5aa1a
Vectorized attention mask application for fp32
15172c3b
Vectorized attention mask application for fp16
d7eae786
Add mask upscale to fp32 if the platform doesn't support fp16
9d244dd7
Fix typo in fp16 eltwise kernels
8faee661
Add validation for custom attention parameters
147d19b6
Add mlas unit test for eltwise kernels
4b1262eb
Refactor python unit GQA tests
f7a07881
Cleanup comments
9dec0564
derdeljan-msft
changed the title Add support custom position ids and attention mask to GQA CPU operator Add support for custom position ids and attention mask to GQA CPU operator 292 days ago
Fix CI pipeline errors
5d23817e
Apply suggestions from code review
42e83d68
Fix docs pipeline build
bc0d69b9
Fix docs pipeline build
ab60cbc6
Fix first batch of PR comments
4e0ca5c2
Fix PR comments
949118f5
Linter fix
62d39a5a
Update attention_mask input description
0349678c
Fix build break
0865ddbf
Fix docs gen CI pipeline
55e09c9a
Apply attention mask after softcap
e3bc338c
Cleanup mlas eltwise module
757af32c
Fix PR comments
0c268c94
Fix position_ids handling for the first prompt
c36a9cfd
Fix build break
86a7737f
derdeljan-msft
changed the title Add support for custom position ids and attention mask to GQA CPU operator Add support for custom position ids and attention bias to GQA CPU operator 285 days ago
Fix PR comments and fix docs gen CI pipeline
56fe7683
tianleiwu
approved these changes
on 2025-03-14
derdeljan-msft
deleted the derdeljan/gqa-tree-decoding branch 284 days ago
Login to write a write a comment.
Login via GitHub