onnxruntime
819b5a3e - Split KV on MHA and Attention ops (#18007)

Commit
2 years ago
Split KV on MHA and Attention ops (#18007) ### Description Implement Split KV optimization for FlashAttention in MHA and Attention operators. ### Motivation and Context Can help further accelerate these ops.
Author
Parents
Loading