onnxruntime
adb0be45 - Refactoring of attention cuda kernel: move prepare qkv and concat_past_to_present (#17559)

Commit
2 years ago
Refactoring of attention cuda kernel: move prepare qkv and concat_past_to_present (#17559) To avoid a huge cu file and make code more readable: - Move PrepareQKV to separate cu file (attention_prepare_qkv.cu) - Move ConcatPastToPresent to attention_concat.cu - Add default value for AttentionData - Add a data structure QkvData to track Q, K and V pointers and track QKV format.
Author
Parents
Loading