onnxruntime
Optimize FastGelu with float2 and float4 vectorized kernels on ROCm
#11491
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
21
Changes
View On
GitHub
Commits
Using vectorized loads (float2) for fp16 to improve performance
hubertlu-tw
committed
3 years ago
Fix a few warnings from cpplint
hubertlu-tw
committed
3 years ago
Fix a few warnings from cpplint
hubertlu-tw
committed
3 years ago
Use __float2half2_rn and fix some cpplint warnings
hubertlu-tw
committed
3 years ago
Move some computaions to LaunchFastGeluKernel
hubertlu-tw
committed
3 years ago
Fix some Lint C++ warning
hubertlu-tw
committed
3 years ago
Using vectorized loads (float4) for fp16 to improve performance
hubertlu-tw
committed
3 years ago
Switch whether to optimize FastGelu with float4 vectorization
hubertlu-tw
committed
3 years ago
Switch to float4 memory access based on input_length in FastGelu
hubertlu-tw
committed
3 years ago
Merge branch 'hubertlu/fastgelu' of https://github.com/ROCmSoftwarePlatform/onnxruntime into hubertlu/fastgelu
hubertlu-tw
committed
3 years ago
Comment how to set the threshold of float2 and float4 vectorized kernels
hubertlu-tw
committed
3 years ago
Merge branch 'master' into hubertlu/fastgelu
hubertlu-tw
committed
3 years ago
Add FastGelu fp16 unit tests for bias_length = 2 and 8
hubertlu-tw
committed
3 years ago
Make vectorized kernels generic with aligned_vector
hubertlu-tw
committed
3 years ago
Unify the vectorized kernels with/without bias
hubertlu-tw
committed
3 years ago
Refactor the code to suppress cpplint warnings
hubertlu-tw
committed
3 years ago
Solve formatting issues
hubertlu-tw
committed
3 years ago
Remove cudaDeviceProp from FastGeluKernel and LaunchFastGeluKernel
hubertlu-tw
committed
3 years ago
Move fast_gelu_impl.h to rocm/bert
hubertlu-tw
committed
3 years ago
Merge remote-tracking branch 'upstream/master' into hubertlu/fastgelu
hubertlu-tw
committed
3 years ago
Fix some Lint C++ warnings and code alignment
hubertlu-tw
committed
3 years ago
Loading