[CUDA] Add PackedMultiHeadAttention operator #16779
add packed MHA op
d3d0b43a
tianleiwu
marked this pull request as draft 2 years ago
fix err message in MHA shape inference
e0467dc6
refactor packed attention
80537acf
register op
de290427
clean unused code in packed attention
3f4e7a0f
remove qkv_hidden_sizes_ from base
1787acbb
format
6e393948
expose LaunchTransposeRemovePadding
c1161901
draft kernel
f3a686cf
Add unit test
31681512
fix build
6101f32f
Add test case
694577ab
fix debug code
e4daa1b4
fix typo
63a4395c
test trt, cutlass and unfused separately
e923ff74
Merge branch 'main' into tlwu/packed_mha
0a6c54d3
instantiation TrtFusedAttention
689299c8
update doc
736975b4
format
e49e004f
exclude from hipify
f924a187
add more test cases
cdc73698
Merge branch 'main' into tlwu/packed_mha
813aa06d
undo test_data_gen script
5a964887
tianleiwu
marked this pull request as ready for review 2 years ago
test cutlass broadcast relative positional bias
43595e58
Merge branch 'tlwu/packed_mha' of https://github.com/microsoft/onnxru…
f7ee1490
Merge branch 'main' into tlwu/packed_mha
c645cd84
gh-yewang
approved these changes
on 2023-07-28
tianleiwu
merged
742edec5
into main 2 years ago
tianleiwu
deleted the tlwu/packed_mha branch 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub