Improve LongformerAttention performance #12448
sequence_index from BxS to S; new stream for copy
da1f2e0d
tianleiwu
marked this pull request as draft 3 years ago
undo memcpy stream
c9c3f961
remove AutoDestoryCudaStream
a689d94a
undo event is_copy_done
dcb6dea9
merge input and output pointers in scratch2
abf2cb54
update comments
1a6a267f
add AddBiasTranspose kernel, new format of weights
b138e593
update default benchmark tests
6cd7aba2
Merge branch 'master'
619238d1
add new format 0 for weight and bias
136c7a7f
fix cpplint warnings
802ec162
Use compact global_q in GEMM
d9ca1c63
use half2/float4 in AddBiasTranspose
3ac6038b
add experiment code
d9546a99
Add an option to test half2
8955a6e9
add default benchmark setting for A100
9ec6bbdd
update benchmark settings
0afaeac2
avoid integer overflow
72cebae7
check gpu memory; output summary in benchmark
fff0c17a
refine check inputs
3cd61072
Merge branch 'main' into tlwu/improve_longformer_attention_perf
6db296d4
fix divide by zero, and update batch sizes
3991369d
change test_times 100=>1000, and total_runs 5=>1
f3c7fe25
update description and summary filename
c1f5fd0d
Add Half8 kernel
eaaffbf4
fix lint warnings
67fe3c3d
undo change in TransformerOptions
432c4f06
remove half8
0f7010e8
tianleiwu
marked this pull request as ready for review 3 years ago
tianleiwu
changed the title [WIP] Improve LongformerAttention performance Improve LongformerAttention performance 3 years ago
adjust import
ad0c09c1
fix amd training pipeline build error
6a44ff4a
change merge_qkv to be default format
cbb9cd19
print to logging
ebfafd92
update unit tests with non empty bias value
9d3c8c61
format python
9f91a60d
wangyems
dismissed these changes
on 2022-08-16
add rocblasGemmHelper to fix amd build
33af03e3
tianleiwu
dismissed their stale review
via 33af03e3
3 years ago
rocblasGemmHelper with extra hipDeviceProp_t arg
18db2bdb
add rocblasGemmStridedBatchedHelper
24a3f306
Fix rocblasGemmStridedBatchedHelper
c48cdc90
wangyems
approved these changes
on 2022-08-17
tianleiwu
merged
ce01ed02
into main 3 years ago
tianleiwu
deleted the tlwu/improve_longformer_attention_perf branch 3 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub