PR #14343 Add memory efficient attention from CUTLASS

Add memory efficient attention from CUTLASS #14343

tianleiwu merged 10 commits into main from tlwu/cutlass_memory_efficient_attention

Add memory efficient attention from cutlass

6873fb27

tianleiwu requested a review from

yufenglee 3 years ago

tianleiwu requested a review from

wangyems 3 years ago

tianleiwu marked this pull request as draft 3 years ago

fix build errors

7f839113

Not compute cu_seqlens when no mask

cfa85596

update patch to fix build error

4f0664ef

not use patch file

2d07854d

remove "si" that declared but never referenced

bde4ae48

tianleiwu marked this pull request as ready for review 3 years ago

yufenglee commented on 2023-01-19

wangyems commented on 2023-01-19

Review feedback

bce7894e

split to multiple cu files to speed up build

4a003fdc

Merge branch 'main' into tlwu/cutlass_memory_efficient_attention

9cb53ca6

not enable two fused backend

7ebe107a

wangyems approved these changes on 2023-01-20

tianleiwu added release:1.14

tianleiwu merged 414b012f into main 3 years ago

tianleiwu deleted the tlwu/cutlass_memory_efficient_attention branch 3 years ago

faxu added triage:approved

faxu removed release:1.14

Reviewers

wangyems

yufenglee

Assignees

No one assigned

Labels

triage:approved

Milestone

No milestone