onnxruntime
[CPU] Optimize GQA attention bias application for FP16
#25871
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
1
Changes
View On
GitHub
[CPU] Optimize GQA attention bias application for FP16
#25871
derdeljan-msft
merged 1 commit into
main
from
derdeljan/optimize_gqa_spec_decoding_fp16
Optimize fp16 attention bias application
8d1bd01d
derdeljan-msft
requested a review
from
tianleiwu
189 days ago
derdeljan-msft
requested a review
from
jywu-msft
189 days ago
derdeljan-msft
requested a review
from
kunal-vaishnavi
189 days ago
derdeljan-msft
requested a review
from
aciddelgado
189 days ago
derdeljan-msft
assigned
derdeljan-msft
189 days ago
kunal-vaishnavi
approved these changes on 2025-08-27
derdeljan-msft
merged
e525ea22
into main
188 days ago
derdeljan-msft
deleted the derdeljan/optimize_gqa_spec_decoding_fp16 branch
188 days ago
jywu-msft
added
release:1.23.0
snnn
removed
release:1.23.0
Login to write a write a comment.
Login via GitHub
Reviewers
kunal-vaishnavi
tianleiwu
jywu-msft
aciddelgado
Assignees
derdeljan-msft
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub