Refactor Attention cuda kernel #17578
move decoder attention and kv cache kernel
b1035615
move fused attention to separated functions
3deaf61c
merge QkvData to AttentionData
b548160c
compute present_size_per_batch_k when needed
b8059182
fix rocm build
344c3340
try fix orttraining build
538c9861
update includes
2caafbf6
format line length < 120
071b95a7
tianleiwu
merged
730fab30
into main 2 years ago
tianleiwu
deleted the tlwu/refactor_decoder_att branch 2 years ago
faxu
added triage:approved
faxu
added sdxl_llama
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub