onnxruntime
[CPU] Optimize GQA attention bias application for FP16
#25871
Merged

[CPU] Optimize GQA attention bias application for FP16 #25871

derdeljan-msft
derdeljan-msft Optimize fp16 attention bias application
8d1bd01d
derdeljan-msft derdeljan-msft requested a review from tianleiwu tianleiwu 189 days ago
derdeljan-msft derdeljan-msft requested a review from jywu-msft jywu-msft 189 days ago
derdeljan-msft derdeljan-msft requested a review from kunal-vaishnavi kunal-vaishnavi 189 days ago
derdeljan-msft derdeljan-msft requested a review from aciddelgado aciddelgado 189 days ago
derdeljan-msft derdeljan-msft assigned derdeljan-msft derdeljan-msft 189 days ago
jywu-msft
kunal-vaishnavi
kunal-vaishnavi approved these changes on 2025-08-27
derdeljan-msft derdeljan-msft merged e525ea22 into main 188 days ago
derdeljan-msft derdeljan-msft deleted the derdeljan/optimize_gqa_spec_decoding_fp16 branch 188 days ago
jywu-msft jywu-msft added release:1.23.0
snnn snnn removed release:1.23.0
snnn

Login to write a write a comment.

Login via GitHub

Assignees
Labels
Milestone