onnxruntime
Add memory efficient attention from CUTLASS
#14343
Merged

Add memory efficient attention from CUTLASS #14343

tianleiwu
tianleiwu Add memory efficient attention from cutlass
6873fb27
tianleiwu tianleiwu requested a review from yufenglee yufenglee 3 years ago
tianleiwu tianleiwu requested a review from wangyems wangyems 3 years ago
tianleiwu tianleiwu marked this pull request as draft 3 years ago
tianleiwu fix build errors
7f839113
tianleiwu Not compute cu_seqlens when no mask
cfa85596
tianleiwu update patch to fix build error
4f0664ef
tianleiwu not use patch file
2d07854d
tianleiwu remove "si" that declared but never referenced
bde4ae48
tianleiwu tianleiwu marked this pull request as ready for review 3 years ago
yufenglee
yufenglee commented on 2023-01-19
yufenglee
yufenglee commented on 2023-01-19
yufenglee
yufenglee commented on 2023-01-19
yufenglee
yufenglee commented on 2023-01-19
wangyems
wangyems commented on 2023-01-19
tianleiwu Review feedback
bce7894e
tianleiwu split to multiple cu files to speed up build
4a003fdc
tianleiwu Merge branch 'main' into tlwu/cutlass_memory_efficient_attention
9cb53ca6
tianleiwu not enable two fused backend
7ebe107a
wangyems
wangyems approved these changes on 2023-01-20
tianleiwu tianleiwu added release:1.14
tianleiwu tianleiwu merged 414b012f into main 3 years ago
tianleiwu tianleiwu deleted the tlwu/cutlass_memory_efficient_attention branch 3 years ago
faxu faxu added triage:approved
faxu faxu removed release:1.14

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone