onnxruntime
4a196d15 - Packed QKV and Rotary Embedding Support for sm<80 GQA (#20012)

Commit
1 year ago
Packed QKV and Rotary Embedding Support for sm<80 GQA (#20012) ### Description Add support for packed qkv input and rotary embedding with sm<80 using memory efficient attention kernel. ### Motivation and Context Allows lower-end gpus to run gqa with packed qkv input and rotary embedding.
Author
Parents
Loading