PR #16779 [CUDA] Add PackedMultiHeadAttention operator

[CUDA] Add PackedMultiHeadAttention operator #16779

tianleiwu merged 26 commits into main from tlwu/packed_mha

add packed MHA op

d3d0b43a

tianleiwu marked this pull request as draft 2 years ago

fix err message in MHA shape inference

e0467dc6

refactor packed attention

80537acf

de290427

clean unused code in packed attention

3f4e7a0f

remove qkv_hidden_sizes_ from base

1787acbb

github-advanced-security commented on 2023-07-20

format

6e393948

expose LaunchTransposeRemovePadding

c1161901

draft kernel

f3a686cf

Add unit test

31681512

fix build

6101f32f

Add test case

694577ab

fix debug code

e4daa1b4

fix typo

63a4395c

test trt, cutlass and unfused separately

e923ff74

github-advanced-security commented on 2023-07-26

Merge branch 'main' into tlwu/packed_mha

0a6c54d3

instantiation TrtFusedAttention

689299c8

update doc

736975b4

format

e49e004f

exclude from hipify

f924a187

add more test cases

cdc73698

Merge branch 'main' into tlwu/packed_mha

813aa06d

undo test_data_gen script

5a964887

tianleiwu requested a review from

yufenglee 2 years ago

tianleiwu requested a review from

gh-yewang 2 years ago

tianleiwu marked this pull request as ready for review 2 years ago

test cutlass broadcast relative positional bias

43595e58

Merge branch 'tlwu/packed_mha' of https://github.com/microsoft/onnxru…

f7ee1490

Merge branch 'main' into tlwu/packed_mha

c645cd84

gh-yewang commented on 2023-07-28

gh-yewang approved these changes on 2023-07-28

tianleiwu merged 742edec5 into main 2 years ago

tianleiwu deleted the tlwu/packed_mha branch 2 years ago

Reviewers

gh-yewang

github-advanced-security

yufenglee

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

onnxruntime [CUDA] Add PackedMultiHeadAttention operator #16779 Merged

[CUDA] Add PackedMultiHeadAttention operator #16779

onnxruntime
[CUDA] Add PackedMultiHeadAttention operator
#16779

Merged