onnxruntime
ROCm Flash Attention
#14838
Merged

ROCm Flash Attention #14838

cloudhan
cloudhan cloudhan changed the title Guangyunhan/fused gemms flash attention ROCm Flash Attention 3 years ago
cloudhan cloudhan force pushed from cf9ee9c9 to 5a673be0 3 years ago
cloudhan cloudhan force pushed from 5a673be0 to ae5d8769 3 years ago
cloudhan cloudhan force pushed from ae5d8769 to 4072ddde 3 years ago
ytaous
cloudhan cloudhan force pushed from 8969aa72 to 6891b943 3 years ago
Base automatically changed from guangyunhan/refactor-rocm-attention to main 3 years ago
cloudhan cloudhan force pushed from 6891b943 to 1d0fc4e5 3 years ago
cloudhan cloudhan force pushed from 1d0fc4e5 to 0410fa60 3 years ago
github-advanced-security
github-advanced-security commented on 2023-03-06
ytaous
cloudhan cloudhan marked this pull request as ready for review 3 years ago
cloudhan cloudhan requested a review from zhangyaobit zhangyaobit 3 years ago
cloudhan cloudhan requested a review from tianleiwu tianleiwu 3 years ago
cloudhan cloudhan requested a review from ytaous ytaous 3 years ago
cloudhan cloudhan requested a review from abudup abudup 3 years ago
tianleiwu
tianleiwu commented on 2023-03-08
tianleiwu
tianleiwu commented on 2023-03-08
tianleiwu
tianleiwu commented on 2023-03-08
tianleiwu
tianleiwu commented on 2023-03-08
ytaous
tianleiwu
tianleiwu commented on 2023-03-08
tianleiwu
tianleiwu commented on 2023-03-08
tianleiwu
tianleiwu commented on 2023-03-08
tianleiwu
tianleiwu commented on 2023-03-08
ytaous
ytaous commented on 2023-03-08
cloudhan
cloudhan cloudhan force pushed from eb958309 to 96c7e0dc 3 years ago
cloudhan Update CK to latest
f109ef70
cloudhan Add basic tunable
8dab1b12
cloudhan Add ck impl
bf653fb3
cloudhan Remove unused parameter
61f6f78a
cloudhan Change mask_index_dims to use TensorShapeVector instead of gsl::span
6d077f84
cloudhan Add test and profile in gemm_softmax_gemm_permute_test.py and corresp…
d667c5b3
cloudhan Split ck attn to masked and biased cases
79c94966
cloudhan Fix crash when total_sequence_length < sequence_length, the workspace…
752c211a
cloudhan Add more profile and tests
740bf2ca
cloudhan Make workspace_buffer void*
fde2957e
cloudhan Add ort internal ck flash attention instances
05261cd9
cloudhan Switch to use ort's flash attention instances
4da79f76
cloudhan Bring back mask conversion (i32 -> f16) for better performance
9b54d9e0
cloudhan Add supports for non-biased+non-masked and biased+masked versions
9efc8608
cloudhan Change params signature
ea628a2f
cloudhan Disable in repo ck srcs debug symbol
7b1b205c
cloudhan Fix and cleanup gemm_softmax_gemm_permute_test.py
ac1cff3e
cloudhan Cleanup
7a5b70b6
cloudhan Refine doc comment
30ea0808
cloudhan cloudhan force pushed from fe3fb169 to 086276d0 3 years ago
cloudhan Clean unused code
f062a4ab
cloudhan Stop using scipy.special.softmax
fa81ba78
cloudhan Fix megatron mask conversion
cda18101
cloudhan Enable megatron mask (mask_4d) tests
de58ac87
cloudhan cloudhan force pushed from 250e1b02 to de58ac87 3 years ago
cloudhan cloudhan requested a review from ytaous ytaous 3 years ago
cloudhan cloudhan requested a review from tianleiwu tianleiwu 3 years ago
tianleiwu
tianleiwu commented on 2023-03-11
cloudhan Compute TunbaleOp workspace iff TunableOp is enabled
98f1fd8c
cloudhan Format
0cbe8a5b
tianleiwu
tianleiwu commented on 2023-03-14
cloudhan cloudhan force pushed from b3d59182 to de40e166 3 years ago
cloudhan Disable one instance
2360a4a2
cloudhan cloudhan force pushed from de40e166 to 2360a4a2 3 years ago
cloudhan
tianleiwu
tianleiwu approved these changes on 2023-03-15
cloudhan cloudhan merged a5ab8824 into main 3 years ago
cloudhan cloudhan deleted the guangyunhan/fused-gemms-flash-attention branch 3 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone