onnxruntime
92c1ed27 - Implement FP32 kleidiai Gemv (#26302)

Commit
5 days ago
Implement FP32 kleidiai Gemv (#26302) ### Description Implementation of special sgemm path which uses GEMV kernels in cases where M or N are 1 Additionally this pr introduces the usage of a microkernel interface which utilizes typedef's provided by KleidiAI such that we can simplify the code and remove things such as ternary operations for SME1 vs SME2 kernels ### Indicative Performance In Lieu of any production models where gemv was a large contributor of the network. I opted to create a mini model to test which contains thousands of randomized matmul variants. With a distribution of GEMV cases throughout <img width="1572" height="148" alt="image (6)" src="https://github.com/user-attachments/assets/451441e4-df5b-42d1-8c6e-ec8dd14161e6" /> Using onnxruntime perf test I was able to half the total inference time vs mlas with this model <img width="1200" height="900" alt="ort_ops_compare_gemv_no_2025-10-07_19-40-30_vs_gemv_2025-10-07_19-40-58" src="https://github.com/user-attachments/assets/ddef3bf3-796c-4f58-8712-361510e2a901" /> **_More Benchmarks to come shortly_** --------- Signed-off-by: Jonathan Clohessy <Jonathan.Clohessy@arm.com> Signed-off-by: Jonathan Clohessy <jonathan.clohessy@arm.com> Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Author
Parents
Loading