onnxruntime
Flash Attention v2 MHA
#17227
Merged

Flash Attention v2 MHA #17227

aciddelgado merged 70 commits into main from flash_v2_packed_mha
aciddelgado
tianleiwu add flash attention v2
f39854de
tianleiwu remove bfloat16 kernels
aa1efa88
tianleiwu namespace
1fe60ae4
tianleiwu remove commented code
6cb6eba2
tianleiwu remove backward code
893b77ec
tianleiwu path
1195d3f6
aciddelgado flash v2 api and api call
4e59d957
aciddelgado api build fixes
b0f7f47d
aciddelgado builds succesfully
0e79ec54
aciddelgado some fixes
2f61ac2d
aciddelgado use varlen to run Attention_Mask1D_FP16_B2 test
372141db
aciddelgado Packed MHA cleanup
95a7e358
aciddelgado flash attention runs
52553a77
aciddelgado update
018ecb6a
aciddelgado pre clean
632451eb
aciddelgado clean PMHA
98a4e5d0
aciddelgado packed mha flash
12f8db86
aciddelgado remove extraneous changes
91c1ab8f
aciddelgado aciddelgado requested a review from tianleiwu tianleiwu 2 years ago
github-advanced-security
github-advanced-security commented on 2023-08-18
tianleiwu
tianleiwu commented on 2023-08-18
tianleiwu
tianleiwu commented on 2023-08-18
tianleiwu
tianleiwu commented on 2023-08-18
tianleiwu
tianleiwu commented on 2023-08-18
aciddelgado reviewed changes
2fc7d86d
tianleiwu
tianleiwu commented on 2023-08-18
aciddelgado reviewed changes
7cf03591
tianleiwu
tianleiwu commented on 2023-08-18
tianleiwu
tianleiwu commented on 2023-08-18
aciddelgado reviewed changes
f6554a13
tianleiwu
tianleiwu commented on 2023-08-18
tianleiwu
tianleiwu commented on 2023-08-18
aciddelgado reviewed changes
7a124878
tianleiwu
tianleiwu commented on 2023-08-18
tianleiwu
tianleiwu commented on 2023-08-18
tianleiwu
tianleiwu commented on 2023-08-18
snnn
tianleiwu
tianleiwu commented on 2023-08-18
tianleiwu
tianleiwu commented on 2023-08-18
tianleiwu
tianleiwu commented on 2023-08-18
aciddelgado reviewed changes
33a43aa7
tianleiwu tianleiwu changed the title Flash v2 packed mha Flash Attention v2 packed mha 2 years ago
tianleiwu
snnn
aciddelgado namespace and USE_FLASH_ATTENTION flag
24524ccb
aciddelgado more compile flags flash
d9002f70
tianleiwu
tianleiwu tianleiwu added release:1.16
tianleiwu
tianleiwu commented on 2023-08-21
tianleiwu
tianleiwu commented on 2023-08-21
aciddelgado lint
3be4af27
aciddelgado gcc warnings in template
3798446f
aciddelgado address tianlei comments
666d3bf3
tianleiwu
tianleiwu commented on 2023-08-22
snnn
snnn commented on 2023-08-22
snnn
clean up
545851aa
tianleiwu
tianleiwu commented on 2023-08-22
cpplint
5798e0b1
tianleiwu
tianleiwu commented on 2023-08-22
tianleiwu
tianleiwu commented on 2023-08-22
tianleiwu refactoring
cad2901a
aciddelgado workspace as buffer. copyright and cgmanifest.
fdb189bd
aciddelgado aciddelgado requested a review 2 years ago
tianleiwu
tianleiwu commented on 2023-08-22
aciddelgado undo cgmanifest
14d7db2e
tianleiwu
tianleiwu commented on 2023-08-22
tianleiwu enable flash attention in MultiHeadAttention op
aba9cebf
tianleiwu fix attention test error in A100
a06da2eb
tianleiwu namespace from flash to onnxruntime::flash
ab015e63
tianleiwu undo cgmanifest
866414cf
tianleiwu add unit test
6d682ab0
tianleiwu enable flash attention in Attention op
300019e3
tianleiwu tianleiwu changed the title Flash Attention v2 packed mha Flash Attention v2 MHA 2 years ago
tianleiwu Merge branch 'main' into flash_v2_packed_mha
7e50d9a2
tianleiwu set proper nvcc threads to avoid OOM
7481ec34
tianleiwu --nvcc_threads=1 in build_cuda_c_api_package.sh
2823476c
tianleiwu tianleiwu requested a review from wangyems wangyems 2 years ago
tianleiwu tianleiwu requested a review from yufenglee yufenglee 2 years ago
aciddelgado test script segfaults
c8801479
github-advanced-security
github-advanced-security commented on 2023-08-24
github-advanced-security
github-advanced-security commented on 2023-08-24
tianleiwu pass cuda device prop to flash attention
d2a040b3
tianleiwu add requirements for test_flash_attn.py
8b92a9b1
tianleiwu remove nvcc_threads logic
a0393e2c
aciddelgado flash attn test, pmha works, mha crashes
0830925a
github-advanced-security
github-advanced-security commented on 2023-08-24
tianleiwu check head size for efficient attention
1c967391
faxu
faxu faxu removed release:1.16
aciddelgado lint except lambda assignment
6d8e43d2
github-advanced-security
github-advanced-security commented on 2023-08-25
github-advanced-security
github-advanced-security commented on 2023-08-25
aciddelgado lint fix
4ceca3ba
tianleiwu line length < 120
3bfa3b55
aciddelgado flash v2 update
bdb17d5c
tianleiwu
tianleiwu commented on 2023-08-29
tianleiwu
tianleiwu commented on 2023-08-29
aciddelgado formatting
bfee28ef
aciddelgado flash benchmark script
a54a7b98
aciddelgado merge with main
b064a010
tianleiwu
tianleiwu commented on 2023-08-29
github-advanced-security
github-advanced-security commented on 2023-08-29
aciddelgado io binding
f6927f7f
tianleiwu update benchmark
345f4e66
tianleiwu Add bert-base
da8bc503
aciddelgado aciddelgado added release:1.16
aciddelgado Merge remote-tracking branch 'origin/main' into flash_v2_packed_mha
40a1f61c
aciddelgado merge main into branch for nuget fix
f7601235
tianleiwu update benchark to support more input formats
92652c33
tianleiwu Merge branch 'flash_v2_packed_mha' of https://github.com/microsoft/on…
ee2296fc
tianleiwu seq len threshold to trigger flash for packed qkv
e998af75
tianleiwu add back 2 lines
599d0198
aciddelgado flash attention flag in packed attention op test and a few more bench…
492c59f7
aciddelgado flash attention flag in packed attention op test and a few more bench…
6400a02b
aciddelgado Merge remote-tracking branch 'refs/remotes/origin/flash_v2_packed_mha…
1929de5d
aciddelgado specify TNLGv4 model for Turing Team in Benchmark
c880f084
aciddelgado remove env variable change from packed attention test
30c2f792
tianleiwu tianleiwu requested a review from snnn snnn 2 years ago
tianleiwu
tianleiwu dismissed these changes on 2023-08-31
aciddelgado python lint
01443ef6
aciddelgado aciddelgado dismissed their stale review via 01443ef6 2 years ago
tianleiwu
yufenglee
yufenglee approved these changes on 2023-08-31
tianleiwu
tianleiwu approved these changes on 2023-08-31
aciddelgado aciddelgado merged 44101e87 into main 2 years ago
aciddelgado aciddelgado deleted the flash_v2_packed_mha branch 2 years ago
natke natke added triage:approved

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone