onnxruntime
a1bbfeb3 - add split3inner (#19886)

Commit
1 year ago
add split3inner (#19886) ### Description <!-- Describe your changes. --> The split op is using pin_memory when split on different sizes. But pin_memory is not capable for using cudagraph. Add a new implementation for only transformer scenarios, it split the qkv_proj into q, k, v, not using pin_memory. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
Author
Parents
Loading