onnxruntime
Optimize FastGelu with float2 and float4 vectorized kernels on ROCm
#11491
Merged

Commits
  • Using vectorized loads (float2) for fp16 to improve performance
    hubertlu-tw committed 3 years ago
  • Fix a few warnings from cpplint
    hubertlu-tw committed 3 years ago
  • Fix a few warnings from cpplint
    hubertlu-tw committed 3 years ago
  • Use __float2half2_rn and fix some cpplint warnings
    hubertlu-tw committed 3 years ago
  • Move some computaions to LaunchFastGeluKernel
    hubertlu-tw committed 3 years ago
  • Fix some Lint C++ warning
    hubertlu-tw committed 3 years ago
  • Using vectorized loads (float4) for fp16 to improve performance
    hubertlu-tw committed 3 years ago
  • Switch whether to optimize FastGelu with float4 vectorization
    hubertlu-tw committed 3 years ago
  • Switch to float4 memory access based on input_length in FastGelu
    hubertlu-tw committed 3 years ago
  • Merge branch 'hubertlu/fastgelu' of https://github.com/ROCmSoftwarePlatform/onnxruntime into hubertlu/fastgelu
    hubertlu-tw committed 3 years ago
  • Comment how to set the threshold of float2 and float4 vectorized kernels
    hubertlu-tw committed 3 years ago
  • Merge branch 'master' into hubertlu/fastgelu
    hubertlu-tw committed 3 years ago
  • Add FastGelu fp16 unit tests for bias_length = 2 and 8
    hubertlu-tw committed 3 years ago
  • Make vectorized kernels generic with aligned_vector
    hubertlu-tw committed 3 years ago
  • Unify the vectorized kernels with/without bias
    hubertlu-tw committed 3 years ago
  • Refactor the code to suppress cpplint warnings
    hubertlu-tw committed 3 years ago
  • Solve formatting issues
    hubertlu-tw committed 3 years ago
  • Remove cudaDeviceProp from FastGeluKernel and LaunchFastGeluKernel
    hubertlu-tw committed 3 years ago
  • Move fast_gelu_impl.h to rocm/bert
    hubertlu-tw committed 3 years ago
  • Merge remote-tracking branch 'upstream/master' into hubertlu/fastgelu
    hubertlu-tw committed 3 years ago
  • Fix some Lint C++ warnings and code alignment
    hubertlu-tw committed 3 years ago
Loading