onnxruntime
Improve LongformerAttention performance
#12448
Merged

Improve LongformerAttention performance #12448

tianleiwu
tianleiwu sequence_index from BxS to S; new stream for copy
da1f2e0d
tianleiwu tianleiwu marked this pull request as draft 3 years ago
tianleiwu undo memcpy stream
c9c3f961
tianleiwu remove AutoDestoryCudaStream
a689d94a
tianleiwu undo event is_copy_done
dcb6dea9
tianleiwu merge input and output pointers in scratch2
abf2cb54
tianleiwu update comments
1a6a267f
tianleiwu add AddBiasTranspose kernel, new format of weights
b138e593
tianleiwu update default benchmark tests
6cd7aba2
tianleiwu Merge branch 'master'
619238d1
tianleiwu add new format 0 for weight and bias
136c7a7f
tianleiwu fix cpplint warnings
802ec162
lgtm-com
tianleiwu Use compact global_q in GEMM
d9ca1c63
lgtm-com
tianleiwu use half2/float4 in AddBiasTranspose
3ac6038b
tianleiwu add experiment code
d9546a99
lgtm-com
tianleiwu Add an option to test half2
8955a6e9
tianleiwu add default benchmark setting for A100
9ec6bbdd
lgtm-com
tianleiwu update benchmark settings
0afaeac2
tianleiwu avoid integer overflow
72cebae7
tianleiwu check gpu memory; output summary in benchmark
fff0c17a
tianleiwu refine check inputs
3cd61072
tianleiwu Merge branch 'main' into tlwu/improve_longformer_attention_perf
6db296d4
lgtm-com
tianleiwu fix divide by zero, and update batch sizes
3991369d
tianleiwu change test_times 100=>1000, and total_runs 5=>1
f3c7fe25
lgtm-com
lgtm-com
tianleiwu update description and summary filename
c1f5fd0d
lgtm-com
tianleiwu Add Half8 kernel
eaaffbf4
tianleiwu fix lint warnings
67fe3c3d
tianleiwu undo change in TransformerOptions
432c4f06
lgtm-com
tianleiwu remove half8
0f7010e8
tianleiwu tianleiwu marked this pull request as ready for review 3 years ago
tianleiwu tianleiwu changed the title [WIP] Improve LongformerAttention performance Improve LongformerAttention performance 3 years ago
tianleiwu tianleiwu requested a review from wangyems wangyems 3 years ago
lgtm-com
tianleiwu adjust import
ad0c09c1
lgtm-com
tianleiwu fix amd training pipeline build error
6a44ff4a
tianleiwu change merge_qkv to be default format
cbb9cd19
tianleiwu print to logging
ebfafd92
lgtm-com
tianleiwu update unit tests with non empty bias value
9d3c8c61
tianleiwu format python
9f91a60d
wangyems
wangyems dismissed these changes on 2022-08-16
lgtm-com
tianleiwu add rocblasGemmHelper to fix amd build
33af03e3
tianleiwu tianleiwu dismissed their stale review via 33af03e3 3 years ago
tianleiwu rocblasGemmHelper with extra hipDeviceProp_t arg
18db2bdb
tianleiwu add rocblasGemmStridedBatchedHelper
24a3f306
lgtm-com
tianleiwu Fix rocblasGemmStridedBatchedHelper
c48cdc90
lgtm-com
wangyems wangyems requested a review from wangyems wangyems 3 years ago
wangyems
wangyems approved these changes on 2022-08-17
tianleiwu tianleiwu merged ce01ed02 into main 3 years ago
tianleiwu tianleiwu deleted the tlwu/improve_longformer_attention_perf branch 3 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone