onnxruntime
Optimize FastGelu with float2 and float4 vectorized kernels on ROCm
#11491
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
21
Changes
View On
GitHub
Optimize FastGelu with float2 and float4 vectorized kernels on ROCm
#11491
zhangyaobit
merged 21 commits into
microsoft:master
from
ROCm:hubertlu/fastgelu
Using vectorized loads (float2) for fp16 to improve performance
664bb50e
Fix a few warnings from cpplint
68904e56
Fix a few warnings from cpplint
5dc6cb5c
Use __float2half2_rn and fix some cpplint warnings
64821fb1
Move some computaions to LaunchFastGeluKernel
4e998546
Fix some Lint C++ warning
e8c19264
Using vectorized loads (float4) for fp16 to improve performance
6d819560
Switch whether to optimize FastGelu with float4 vectorization
98686e6a
Switch to float4 memory access based on input_length in FastGelu
b87d0a9e
Merge branch 'hubertlu/fastgelu' of https://github.com/ROCmSoftwarePl…
c1e8b7c4
Comment how to set the threshold of float2 and float4 vectorized kernels
5264e642
hubertlu-tw
changed the title
Hubertlu/fastgelu
Optimize FastGelu with float2 and float4 vectorized kernels on ROCm
3 years ago
Merge branch 'master' into hubertlu/fastgelu
4b6a1bb8
zhangyaobit
commented on 2022-05-27
zhangyaobit
requested a review
from
pengwa
3 years ago
zhangyaobit
requested a review
from
PeixuanZuo
3 years ago
Add FastGelu fp16 unit tests for bias_length = 2 and 8
35e2caad
Make vectorized kernels generic with aligned_vector
96d548df
zhangyaobit
commented on 2022-06-10
zhangyaobit
commented on 2022-06-10
Unify the vectorized kernels with/without bias
81a571be
Refactor the code to suppress cpplint warnings
a78464dc
Solve formatting issues
07998575
zhangyaobit
commented on 2022-06-17
Remove cudaDeviceProp from FastGeluKernel and LaunchFastGeluKernel
9269e11a
zhangyaobit
dismissed these changes on 2022-06-20
zhangyaobit
closed this
3 years ago
zhangyaobit
reopened this
3 years ago
Move fast_gelu_impl.h to rocm/bert
98d7c3ce
Merge remote-tracking branch 'upstream/master' into hubertlu/fastgelu
085ec9ae
hubertlu-tw
dismissed their stale review via
085ec9ae
3 years ago
tianleiwu
commented on 2022-06-23
tianleiwu
commented on 2022-06-23
tianleiwu
dismissed these changes on 2022-06-23
Fix some Lint C++ warnings and code alignment
86dad9af
hubertlu-tw
dismissed their stale review via
86dad9af
3 years ago
zhangyaobit
approved these changes on 2022-06-24
tianleiwu
approved these changes on 2022-06-24
zhangyaobit
merged
f4ba199b
into master
3 years ago
Login to write a write a comment.
Login via GitHub
Reviewers
tianleiwu
zhangyaobit
pengwa
PeixuanZuo
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub