GQA unfused attention with FP32 QK accumulation (fixes #28195) #28198
GQA unfused attention with FP32 QK accumulation (fixes #28195)
99aa8b22
review feedback
4fa20eff
fix: address review feedback - SafeInt AlignTo, y_bnsh H_v, ORT_ENFORCE
eeec5123
fix: address review summary feedback - SafeInt, logging, tests, v_hea…
b0f71ac0
Add C++ tests for GQA unfused attention with large head_size
778ee069
feedbacks
6354c525
address feedback
158a004d
fix build
0fcde9a9
tianleiwu
dismissed their stale review
via 0fcde9a9
21 days ago
tianleiwu
merged
997c4798
into main 20 days ago
tianleiwu
deleted the tlwu/unfused_gqa branch 20 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub