onnxruntime
adb0be45 - Refactoring of attention cuda kernel: move prepare qkv and concat_past_to_present (#17559)

Commit

2 years ago

Refactoring of attention cuda kernel: move prepare qkv and concat_past_to_present (#17559) To avoid a huge cu file and make code more readable: - Move PrepareQKV to separate cu file (attention_prepare_qkv.cu) - Move ConcatPastToPresent to attention_concat.cu - Add default value for AttentionData - Add a data structure QkvData to track Q, K and V pointers and track QKV format.

References

#17559 - Refactoring of attention cuda kernel: move prepare qkv and concat_past_to_present

Author

tianleiwu

Parents

af80542e

onnxruntime adb0be45 - Refactoring of attention cuda kernel: move prepare qkv and concat_past_to_present (#17559)

onnxruntime
adb0be45 - Refactoring of attention cuda kernel: move prepare qkv and concat_past_to_present (#17559)