onnxruntime
[CUDA] GroupQueryAttention operator using FlashAttention
#17674
Merged

[CUDA] GroupQueryAttention operator using FlashAttention #17674

aciddelgado merged 115 commits into main from aciddelgado/group_query_attention
aciddelgado
tianleiwu add flash attention v2
f39854de
tianleiwu remove bfloat16 kernels
aa1efa88
tianleiwu namespace
1fe60ae4
tianleiwu remove commented code
6cb6eba2
tianleiwu remove backward code
893b77ec
tianleiwu path
1195d3f6
aciddelgado flash v2 api and api call
4e59d957
aciddelgado api build fixes
b0f7f47d
aciddelgado builds succesfully
0e79ec54
aciddelgado some fixes
2f61ac2d
aciddelgado use varlen to run Attention_Mask1D_FP16_B2 test
372141db
aciddelgado Packed MHA cleanup
95a7e358
aciddelgado flash attention runs
52553a77
aciddelgado update
018ecb6a
aciddelgado pre clean
632451eb
aciddelgado clean PMHA
98a4e5d0
aciddelgado packed mha flash
12f8db86
aciddelgado remove extraneous changes
91c1ab8f
aciddelgado reviewed changes
2fc7d86d
aciddelgado reviewed changes
7cf03591
aciddelgado reviewed changes
f6554a13
aciddelgado reviewed changes
7a124878
aciddelgado reviewed changes
33a43aa7
aciddelgado namespace and USE_FLASH_ATTENTION flag
24524ccb
aciddelgado more compile flags flash
d9002f70
aciddelgado lint
3be4af27
aciddelgado gcc warnings in template
3798446f
aciddelgado address tianlei comments
666d3bf3
clean up
545851aa
cpplint
5798e0b1
tianleiwu refactoring
cad2901a
aciddelgado workspace as buffer. copyright and cgmanifest.
fdb189bd
aciddelgado undo cgmanifest
14d7db2e
tianleiwu enable flash attention in MultiHeadAttention op
aba9cebf
tianleiwu fix attention test error in A100
a06da2eb
tianleiwu namespace from flash to onnxruntime::flash
ab015e63
tianleiwu undo cgmanifest
866414cf
tianleiwu add unit test
6d682ab0
tianleiwu enable flash attention in Attention op
300019e3
tianleiwu Merge branch 'main' into flash_v2_packed_mha
7e50d9a2
tianleiwu set proper nvcc threads to avoid OOM
7481ec34
tianleiwu --nvcc_threads=1 in build_cuda_c_api_package.sh
2823476c
aciddelgado test script segfaults
c8801479
tianleiwu pass cuda device prop to flash attention
d2a040b3
tianleiwu add requirements for test_flash_attn.py
8b92a9b1
tianleiwu remove nvcc_threads logic
a0393e2c
aciddelgado flash attn test, pmha works, mha crashes
0830925a
tianleiwu check head size for efficient attention
1c967391
aciddelgado lint except lambda assignment
6d8e43d2
aciddelgado lint fix
4ceca3ba
tianleiwu line length < 120
3bfa3b55
aciddelgado flash v2 update
bdb17d5c
aciddelgado formatting
bfee28ef
aciddelgado flash benchmark script
a54a7b98
aciddelgado merge with main
b064a010
aciddelgado Update c-api-noopenmp-packaging-pipelines.yml
e7b7f2e9
aciddelgado io binding
f6927f7f
tianleiwu update benchmark
345f4e66
tianleiwu Add bert-base
da8bc503
aciddelgado Merge remote-tracking branch 'origin/main' into flash_v2_packed_mha
40a1f61c
aciddelgado merge main into branch for nuget fix
f7601235
aciddelgado Merge branch 'flash_v2_packed_mha' into flash_v2_no_cuda52
0dc8613d
tianleiwu update benchark to support more input formats
92652c33
tianleiwu Merge branch 'flash_v2_packed_mha' of https://github.com/microsoft/on…
ee2296fc
tianleiwu seq len threshold to trigger flash for packed qkv
e998af75
tianleiwu add back 2 lines
599d0198
aciddelgado flash attention flag in packed attention op test and a few more bench…
492c59f7
aciddelgado flash attention flag in packed attention op test and a few more bench…
6400a02b
aciddelgado Merge remote-tracking branch 'refs/remotes/origin/flash_v2_packed_mha…
1929de5d
aciddelgado specify TNLGv4 model for Turing Team in Benchmark
c880f084
aciddelgado remove env variable change from packed attention test
30c2f792
aciddelgado python lint
01443ef6
aciddelgado Merge remote-tracking branch 'origin/main' into flash_v2_packed_mha
6a06d9e1
aciddelgado Merge branch 'flash_v2_packed_mha' into flash_v2_no_cuda52
e1eb49aa
aciddelgado start work on group query attention
7605bb49
aciddelgado work on check input and group query attention cc
0697d19e
aciddelgado more work on gqa
b9784dc6
aciddelgado gqa working with causal or without causal
5e7286ec
aciddelgado push before rebase
cb0a96f8
aciddelgado merge with main
afb493ee
aciddelgado gqa with past builds
6053c865
aciddelgado gqa working with past kv
11608be8
aciddelgado Merge remote-tracking branch 'origin/main' into aciddelgado/group_que…
9d31ad13
aciddelgado some code cleaning
9d2f9226
aciddelgado some fixes and clean up
bdb38670
aciddelgado no dumper
362c6aeb
aciddelgado premerge main
04801df4
aciddelgado lint
3a11592a
aciddelgado mergemain
2941dbc7
aciddelgado Merge remote-tracking branch 'origin/main' into aciddelgado/group_que…
d78f4769
aciddelgado fix illegal access memory issue
2d0b960b
aciddelgado clean up
5b076f70
aciddelgado bytes
3bf777c5
aciddelgado merge main
cdc65dcc
aciddelgado gqa final touches
0e33dc1b
aciddelgado build fixes gqa
de64ff4e
aciddelgado lint
7a2ad7ca
aciddelgado aciddelgado requested a review from tianleiwu tianleiwu 2 years ago
aciddelgado aciddelgado requested a review from yufenglee yufenglee 2 years ago
github-advanced-security
github-advanced-security commented on 2023-09-22
tianleiwu
tianleiwu commented on 2023-09-22
tianleiwu
tianleiwu commented on 2023-09-22
aciddelgado benchmark gqa vs dmmha
7a476963
github-advanced-security
github-advanced-security commented on 2023-09-22
github-advanced-security
github-advanced-security commented on 2023-09-22
tianleiwu
tianleiwu
tianleiwu commented on 2023-09-23
tianleiwu
tianleiwu commented on 2023-09-23
tianleiwu
tianleiwu commented on 2023-09-23
tianleiwu
tianleiwu commented on 2023-09-23
tianleiwu
tianleiwu commented on 2023-09-23
wejoncy
aciddelgado fix comments
437d23c3
aciddelgado start work bnsh
365d0b51
aciddelgado bsnh present
470a8a7e
aciddelgado Support for BNSH format
05d1c56b
tianleiwu
tianleiwu commented on 2023-09-25
tianleiwu
tianleiwu commented on 2023-09-25
tianleiwu
tianleiwu commented on 2023-09-25
tianleiwu
tianleiwu commented on 2023-09-25
tianleiwu
tianleiwu commented on 2023-09-25
tianleiwu
tianleiwu commented on 2023-09-25
tianleiwu
tianleiwu commented on 2023-09-25
aciddelgado bnsh attribute and benchmark
27dfac55
aciddelgado past-present bnsh, non-cache past-present.
6d681ee2
aciddelgado merge bnsh and no buff
0e76730d
github-advanced-security
github-advanced-security commented on 2023-09-28
aciddelgado lint and benchmark script
a7482edf
aciddelgado fix build issue
46f0ce45
tianleiwu
tianleiwu commented on 2023-10-02
tianleiwu
tianleiwu commented on 2023-10-02
aciddelgado fix build pipeline
86792141
tianleiwu
tianleiwu commented on 2023-10-03
tianleiwu
tianleiwu commented on 2023-10-03
tianleiwu
tianleiwu commented on 2023-10-03
tianleiwu
tianleiwu commented on 2023-10-03
tianleiwu
tianleiwu commented on 2023-10-03
tianleiwu
tianleiwu commented on 2023-10-03
aciddelgado pr cleanup
a0ec0eb2
aciddelgado int64 past sequence
b4082d10
tianleiwu
tianleiwu commented on 2023-10-05
tianleiwu
tianleiwu
tianleiwu commented on 2023-10-06
tianleiwu
tianleiwu commented on 2023-10-06
aciddelgado small review changes p1
befdb2d1
aciddelgado clang-format and update documentation
3fb6b9c6
tianleiwu
tianleiwu dismissed these changes on 2023-10-06
tianleiwu tianleiwu changed the title Aciddelgado/group query attention [CUDA] GroupQueryAttention operator using FlashAttention 2 years ago
aciddelgado ignore whitespace when diff documentation
fcaba356
aciddelgado aciddelgado dismissed their stale review via fcaba356 2 years ago
aciddelgado ignore blank lines
3a06b64a
tianleiwu
tianleiwu dismissed these changes on 2023-10-06
aciddelgado formatting whitespace
bbc47f0b
aciddelgado aciddelgado dismissed their stale review via bbc47f0b 2 years ago
tianleiwu
tianleiwu approved these changes on 2023-10-09
aciddelgado aciddelgado merged 406cd324 into main 2 years ago
aciddelgado aciddelgado deleted the aciddelgado/group_query_attention branch 2 years ago
aciddelgado aciddelgado added release:1.16.2
faxu faxu added triage:approved
faxu faxu added sdxl_llama
tianleiwu tianleiwu removed triage:approved
tianleiwu tianleiwu removed release:1.16.2
tianleiwu tianleiwu removed sdxl_llama

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone