PR #14838 ROCm Flash Attention

ROCm Flash Attention #14838

cloudhan merged 26 commits into main from guangyunhan/fused-gemms-flash-attention

cloudhan changed the title ~~Guangyunhan/fused gemms flash attention~~ ROCm Flash Attention 3 years ago

cloudhan force pushed from cf9ee9c9 to 5a673be0 3 years ago

cloudhan force pushed from 5a673be0 to ae5d8769 3 years ago

cloudhan force pushed from ae5d8769 to 4072ddde 3 years ago

cloudhan force pushed from 8969aa72 to 6891b943 3 years ago

Base automatically changed from guangyunhan/refactor-rocm-attention to main 3 years ago

cloudhan force pushed from 6891b943 to 1d0fc4e5 3 years ago

cloudhan force pushed from 1d0fc4e5 to 0410fa60 3 years ago

github-advanced-security commented on 2023-03-06

cloudhan marked this pull request as ready for review 3 years ago

cloudhan requested a review from

zhangyaobit 3 years ago

cloudhan requested a review from

tianleiwu 3 years ago

cloudhan requested a review from

ytaous 3 years ago

cloudhan requested a review from

abudup 3 years ago

tianleiwu commented on 2023-03-08

ytaous commented on 2023-03-08

cloudhan force pushed from eb958309 to 96c7e0dc 3 years ago

Update CK to latest

f109ef70

Add basic tunable

8dab1b12

Add ck impl

bf653fb3

Remove unused parameter

61f6f78a

Change mask_index_dims to use TensorShapeVector instead of gsl::span

6d077f84

Add test and profile in gemm_softmax_gemm_permute_test.py and corresp…

d667c5b3

Split ck attn to masked and biased cases

79c94966

Fix crash when total_sequence_length < sequence_length, the workspace…

752c211a

Add more profile and tests

740bf2ca

Make workspace_buffer void*

fde2957e

Add ort internal ck flash attention instances

05261cd9

Switch to use ort's flash attention instances

4da79f76

Bring back mask conversion (i32 -> f16) for better performance

9b54d9e0

Add supports for non-biased+non-masked and biased+masked versions

9efc8608

Change params signature

ea628a2f

Disable in repo ck srcs debug symbol

7b1b205c

Fix and cleanup gemm_softmax_gemm_permute_test.py

ac1cff3e

Cleanup

7a5b70b6

Refine doc comment

30ea0808

cloudhan force pushed from fe3fb169 to 086276d0 3 years ago

Clean unused code

f062a4ab

Stop using scipy.special.softmax

fa81ba78

Fix megatron mask conversion

cda18101

Enable megatron mask (mask_4d) tests

de58ac87

cloudhan force pushed from 250e1b02 to de58ac87 3 years ago

cloudhan requested a review from

ytaous 3 years ago

cloudhan requested a review from

tianleiwu 3 years ago

tianleiwu commented on 2023-03-11

Compute TunbaleOp workspace iff TunableOp is enabled

98f1fd8c

Format

0cbe8a5b

tianleiwu commented on 2023-03-14

cloudhan force pushed from b3d59182 to de40e166 3 years ago

Disable one instance

2360a4a2

cloudhan force pushed from de40e166 to 2360a4a2 3 years ago

tianleiwu approved these changes on 2023-03-15

cloudhan merged a5ab8824 into main 3 years ago

cloudhan deleted the guangyunhan/fused-gemms-flash-attention branch 3 years ago

Reviewers

tianleiwu

ytaous

github-advanced-security

zhangyaobit

abudup

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

onnxruntime ROCm Flash Attention #14838 Merged

ROCm Flash Attention #14838

onnxruntime
ROCm Flash Attention
#14838

Merged