onnxruntime
ROCm Flash Attention
#14838
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
26
Changes
View On
GitHub
ROCm Flash Attention
#14838
cloudhan
merged 26 commits into
main
from
guangyunhan/fused-gemms-flash-attention
cloudhan
changed the title
Guangyunhan/fused gemms flash attention
ROCm Flash Attention
3 years ago
cloudhan
force pushed
from
cf9ee9c9
to
5a673be0
3 years ago
cloudhan
force pushed
from
5a673be0
to
ae5d8769
3 years ago
cloudhan
force pushed
from
ae5d8769
to
4072ddde
3 years ago
cloudhan
force pushed
from
8969aa72
to
6891b943
3 years ago
Base automatically changed from
guangyunhan/refactor-rocm-attention
to
main
3 years ago
cloudhan
force pushed
from
6891b943
to
1d0fc4e5
3 years ago
cloudhan
force pushed
from
1d0fc4e5
to
0410fa60
3 years ago
github-advanced-security
commented on 2023-03-06
cloudhan
marked this pull request as ready for review
3 years ago
cloudhan
requested a review
from
zhangyaobit
3 years ago
cloudhan
requested a review
from
tianleiwu
3 years ago
cloudhan
requested a review
from
ytaous
3 years ago
cloudhan
requested a review
from
abudup
3 years ago
tianleiwu
commented on 2023-03-08
tianleiwu
commented on 2023-03-08
tianleiwu
commented on 2023-03-08
tianleiwu
commented on 2023-03-08
tianleiwu
commented on 2023-03-08
tianleiwu
commented on 2023-03-08
tianleiwu
commented on 2023-03-08
tianleiwu
commented on 2023-03-08
ytaous
commented on 2023-03-08
cloudhan
force pushed
from
eb958309
to
96c7e0dc
3 years ago
Update CK to latest
f109ef70
Add basic tunable
8dab1b12
Add ck impl
bf653fb3
Remove unused parameter
61f6f78a
Change mask_index_dims to use TensorShapeVector instead of gsl::span
6d077f84
Add test and profile in gemm_softmax_gemm_permute_test.py and corresp…
d667c5b3
Split ck attn to masked and biased cases
79c94966
Fix crash when total_sequence_length < sequence_length, the workspace…
752c211a
Add more profile and tests
740bf2ca
Make workspace_buffer void*
fde2957e
Add ort internal ck flash attention instances
05261cd9
Switch to use ort's flash attention instances
4da79f76
Bring back mask conversion (i32 -> f16) for better performance
9b54d9e0
Add supports for non-biased+non-masked and biased+masked versions
9efc8608
Change params signature
ea628a2f
Disable in repo ck srcs debug symbol
7b1b205c
Fix and cleanup gemm_softmax_gemm_permute_test.py
ac1cff3e
Cleanup
7a5b70b6
Refine doc comment
30ea0808
cloudhan
force pushed
from
fe3fb169
to
086276d0
3 years ago
Clean unused code
f062a4ab
Stop using scipy.special.softmax
fa81ba78
Fix megatron mask conversion
cda18101
Enable megatron mask (mask_4d) tests
de58ac87
cloudhan
force pushed
from
250e1b02
to
de58ac87
3 years ago
cloudhan
requested a review
from
ytaous
3 years ago
cloudhan
requested a review
from
tianleiwu
3 years ago
tianleiwu
commented on 2023-03-11
Compute TunbaleOp workspace iff TunableOp is enabled
98f1fd8c
Format
0cbe8a5b
tianleiwu
commented on 2023-03-14
cloudhan
force pushed
from
b3d59182
to
de40e166
3 years ago
Disable one instance
2360a4a2
cloudhan
force pushed
from
de40e166
to
2360a4a2
3 years ago
tianleiwu
approved these changes on 2023-03-15
cloudhan
merged
a5ab8824
into main
3 years ago
cloudhan
deleted the guangyunhan/fused-gemms-flash-attention branch
3 years ago
Login to write a write a comment.
Login via GitHub
Reviewers
tianleiwu
ytaous
github-advanced-security
zhangyaobit
abudup
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub