onnxruntime
742edec5
- [CUDA] Add PackedMultiHeadAttention operator (#16779)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
2 years ago
[CUDA] Add PackedMultiHeadAttention operator (#16779) ### Description Add new operator for MultiHeadAttention with inputs removed padding. This only supports packed QKV format.
References
#16779 - [CUDA] Add PackedMultiHeadAttention operator
Author
tianleiwu
Parents
7c05f7ba
Loading