onnxruntime
Optimize FastGelu with float2 and float4 vectorized kernels on ROCm
#11491
Merged

Optimize FastGelu with float2 and float4 vectorized kernels on ROCm #11491

hubertlu-tw
Using vectorized loads (float2) for fp16 to improve performance
664bb50e
Fix a few warnings from cpplint
68904e56
Fix a few warnings from cpplint
5dc6cb5c
hubertlu-tw Use __float2half2_rn and fix some cpplint warnings
64821fb1
Move some computaions to LaunchFastGeluKernel
4e998546
Fix some Lint C++ warning
e8c19264
Using vectorized loads (float4) for fp16 to improve performance
6d819560
hubertlu-tw Switch whether to optimize FastGelu with float4 vectorization
98686e6a
hubertlu-tw Switch to float4 memory access based on input_length in FastGelu
b87d0a9e
Merge branch 'hubertlu/fastgelu' of https://github.com/ROCmSoftwarePl…
c1e8b7c4
Comment how to set the threshold of float2 and float4 vectorized kernels
5264e642
hubertlu-tw hubertlu-tw changed the title Hubertlu/fastgelu Optimize FastGelu with float2 and float4 vectorized kernels on ROCm 3 years ago
hubertlu-tw Merge branch 'master' into hubertlu/fastgelu
4b6a1bb8
hubertlu-tw
hubertlu-tw
zhangyaobit
zhangyaobit commented on 2022-05-27
zhangyaobit zhangyaobit requested a review from pengwa pengwa 3 years ago
zhangyaobit zhangyaobit requested a review from PeixuanZuo PeixuanZuo 3 years ago
zhangyaobit
Add FastGelu fp16 unit tests for bias_length = 2 and 8
35e2caad
Make vectorized kernels generic with aligned_vector
96d548df
zhangyaobit
zhangyaobit commented on 2022-06-10
zhangyaobit
zhangyaobit commented on 2022-06-10
Unify the vectorized kernels with/without bias
81a571be
Refactor the code to suppress cpplint warnings
a78464dc
Solve formatting issues
07998575
zhangyaobit
zhangyaobit commented on 2022-06-17
Remove cudaDeviceProp from FastGeluKernel and LaunchFastGeluKernel
9269e11a
zhangyaobit
zhangyaobit dismissed these changes on 2022-06-20
zhangyaobit zhangyaobit closed this 3 years ago
zhangyaobit zhangyaobit reopened this 3 years ago
zhangyaobit
azure-pipelines
zhangyaobit
azure-pipelines
ytaous
Move fast_gelu_impl.h to rocm/bert
98d7c3ce
Merge remote-tracking branch 'upstream/master' into hubertlu/fastgelu
085ec9ae
hubertlu-tw hubertlu-tw dismissed their stale review via 085ec9ae 3 years ago
azure-pipelines
zhangyaobit
zhangyaobit
azure-pipelines
azure-pipelines
tianleiwu
tianleiwu commented on 2022-06-23
tianleiwu
tianleiwu commented on 2022-06-23
tianleiwu
tianleiwu dismissed these changes on 2022-06-23
Fix some Lint C++ warnings and code alignment
86dad9af
hubertlu-tw hubertlu-tw dismissed their stale review via 86dad9af 3 years ago
tianleiwu
tianleiwu
azure-pipelines
azure-pipelines
zhangyaobit
zhangyaobit approved these changes on 2022-06-24
tianleiwu
tianleiwu approved these changes on 2022-06-24
zhangyaobit zhangyaobit merged f4ba199b into master 3 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone