onnxruntime
79a2d250 - [ARM CPU] SVE support for Elementwise kernels (#25238)

Commit
226 days ago
[ARM CPU] SVE support for Elementwise kernels (#25238) ### Description <!-- Describe your changes. --> Ports the `MlasErfKernel`, `MlasLogisticKernel` and `MlasComputeSoftmax` kernels to the ARM SVE backend. Specifically, the following functions have been ported. - `MlasErfKernel` (lib/erf.cpp) - `MlasLogisticKernel` (lib/logistic.cpp) - `MlasComputeSumExpF32Kernel` (lib/compute.cpp) - `MlasReduceMaximumF32Kernel` (lib/compute.cpp) - `MlasComputeSoftmaxOutputF32Kernel` (lib/compute.cpp) - `MlasComputeSoftmaxThreaded` (lib/compute.cpp) This PR uses the following design structure: adds new wrapper implementations of SVE functions in `lib/mlasi_sve.h` similar to `mlasi.h` and calls these wrapper functions in each kernel's implementation. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This work is a step toward making ONNX Runtime more performant and architecture-aware on ARM platforms. ### Performance Analysis ![image](https://github.com/user-attachments/assets/34120c33-0ead-4a03-9d84-e74b1dc61856) - Observed upto 1.4x speedup at the operator level - Performance is tested on AWS Graviton3E This PR is a joint contribution by: - @NishantPrabhuFujitsu - @sanketkaleoss --------- Co-authored-by: Sanket Kale <sanketk.kale@fujitsu.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Author
Parents
Loading