PR #17578 Refactor Attention cuda kernel

Refactor Attention cuda kernel #17578

tianleiwu merged 8 commits into main from tlwu/refactor_decoder_att

move decoder attention and kv cache kernel

b1035615

move fused attention to separated functions

3deaf61c

tianleiwu requested a review from

wangyems 2 years ago

tianleiwu requested a review from

yufenglee 2 years ago

tianleiwu requested a review from

aciddelgado 2 years ago

merge QkvData to AttentionData

b548160c

compute present_size_per_batch_k when needed

b8059182

fix rocm build

344c3340

try fix orttraining build

538c9861

update includes

2caafbf6

format line length < 120

071b95a7

aciddelgado commented on 2023-09-19

aciddelgado approved these changes on 2023-09-19

tianleiwu merged 730fab30 into main 2 years ago

tianleiwu deleted the tlwu/refactor_decoder_att branch 2 years ago

tianleiwu added release:1.16.2

faxu added triage:approved

faxu added sdxl_llama

tianleiwu removed triage:approved

tianleiwu removed release:1.16.2

tianleiwu removed sdxl_llama

Reviewers

aciddelgado

wangyems

yufenglee

Assignees

No one assigned

Labels

None yet

Milestone

No milestone