onnxruntime
21404e34 - [CUDA EP] Add hardswish op and add bf16 support for hardsigmoid (#25562)

Commit
129 days ago
[CUDA EP] Add hardswish op and add bf16 support for hardsigmoid (#25562) ### Description <!-- Describe your changes. --> Add HardSwish operator which is x*HardSigmoid(x) Add bf16 support for HardSigmoid ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> HardSwish is implemented as HardSidmoid + Add in CUDA EP currently. A fused HardSwish should take half the time of HardSigmoid + Add. --------- Co-authored-by: kaiyu <kaiyu@bytedance.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Author
Parents
Loading