onnxruntime
742edec5 - [CUDA] Add PackedMultiHeadAttention operator (#16779)

Commit
2 years ago
[CUDA] Add PackedMultiHeadAttention operator (#16779) ### Description Add new operator for MultiHeadAttention with inputs removed padding. This only supports packed QKV format.
Author
Parents
Loading