onnxruntime
Update replacing MultiHeadAttention with GroupQueryAttention
#19882
Merged

Update replacing MultiHeadAttention with GroupQueryAttention #19882

kunal-vaishnavi
kunal-vaishnavi Update replacing MHA with GQA
f9492a1e
aciddelgado
aciddelgado approved these changes on 2024-03-13
kunal-vaishnavi kunal-vaishnavi merged 4ac98d6d into main 1 year ago
kunal-vaishnavi kunal-vaishnavi added release:1.17.3

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone