onnxruntime
819b5a3e
- Split KV on MHA and Attention ops (#18007)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
2 years ago
Split KV on MHA and Attention ops (#18007) ### Description Implement Split KV optimization for FlashAttention in MHA and Attention operators. ### Motivation and Context Can help further accelerate these ops.
References
#18007 - Split KV on MHA and Attention ops
Author
aciddelgado
Parents
c1811597
Loading