onnxruntime
Refactor Attention cuda kernel
#17578
Merged

Refactor Attention cuda kernel #17578

tianleiwu merged 8 commits into main from tlwu/refactor_decoder_att
tianleiwu
tianleiwu move decoder attention and kv cache kernel
b1035615
tianleiwu move fused attention to separated functions
3deaf61c
tianleiwu tianleiwu requested a review from wangyems wangyems 2 years ago
tianleiwu tianleiwu requested a review from yufenglee yufenglee 2 years ago
tianleiwu tianleiwu requested a review from aciddelgado aciddelgado 2 years ago
tianleiwu merge QkvData to AttentionData
b548160c
tianleiwu compute present_size_per_batch_k when needed
b8059182
tianleiwu fix rocm build
344c3340
tianleiwu try fix orttraining build
538c9861
tianleiwu update includes
2caafbf6
tianleiwu format line length < 120
071b95a7
aciddelgado
aciddelgado commented on 2023-09-19
aciddelgado
aciddelgado approved these changes on 2023-09-19
tianleiwu tianleiwu merged 730fab30 into main 2 years ago
tianleiwu tianleiwu deleted the tlwu/refactor_decoder_att branch 2 years ago
tianleiwu tianleiwu added release:1.16.2
faxu faxu added triage:approved
faxu faxu added sdxl_llama
tianleiwu tianleiwu removed triage:approved
tianleiwu tianleiwu removed release:1.16.2
tianleiwu tianleiwu removed sdxl_llama

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone