Refactoring of attention cuda kernel: move prepare qkv and concat_past_to_present #17559
move prepare qkv and present
33a64056
tianleiwu
marked this pull request as draft 2 years ago
update include
7a64f986
fix hipify
36260e8d
tianleiwu
marked this pull request as ready for review 2 years ago
tianleiwu
merged
adb0be45
into main 2 years ago
tianleiwu
deleted the tlwu/prepare_qkv_refactor branch 2 years ago
faxu
added triage:approved
faxu
added sdxl_llama
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub