onnxruntime
30c5f059 - Add Paged Attention Op for CUDA SM80 support (#24595)

Commit
200 days ago
Add Paged Attention Op for CUDA SM80 support (#24595) ### Description Adds Paged Attention Op which enables of Paged KV Cache. Inputs to this op are unpadded (packed / varlen) so Cumulative Sequence Lengths are a required input. ### Motivation and Context Adding this op to ONNXRuntime is necessary to allow the GenAI team to enable a continuous batching server API.
Author
Parents
Loading