onnxruntime
Refactoring of attention cuda kernel: move prepare qkv and concat_past_to_present
#17559
Merged

Refactoring of attention cuda kernel: move prepare qkv and concat_past_to_present #17559

tianleiwu merged 3 commits into main from tlwu/prepare_qkv_refactor
tianleiwu
tianleiwu move prepare qkv and present
33a64056
tianleiwu tianleiwu marked this pull request as draft 2 years ago
tianleiwu update include
7a64f986
tianleiwu fix hipify
36260e8d
tianleiwu tianleiwu marked this pull request as ready for review 2 years ago
tianleiwu tianleiwu requested a review from aciddelgado aciddelgado 2 years ago
tianleiwu tianleiwu requested a review from kunal-vaishnavi kunal-vaishnavi 2 years ago
tianleiwu tianleiwu requested a review from wangyems wangyems 2 years ago
aciddelgado
aciddelgado approved these changes on 2023-09-15
tianleiwu tianleiwu merged adb0be45 into main 2 years ago
tianleiwu tianleiwu deleted the tlwu/prepare_qkv_refactor branch 2 years ago
tianleiwu tianleiwu added release:1.16.2
faxu faxu added triage:approved
faxu faxu added sdxl_llama
tianleiwu tianleiwu removed triage:approved
tianleiwu tianleiwu removed release:1.16.2
tianleiwu tianleiwu removed sdxl_llama

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone