onnxruntime
ec3bf7f0 - Integrate SME1 SGEMM KleidiAI kernels (#25760)

Commit
138 days ago
Integrate SME1 SGEMM KleidiAI kernels (#25760) **Key changes** This PR integrates KleidiAI SME1 FP32 kernels into the existing kleidiai_sgemm.cpp implementation. Adding SME2 flag in onnxruntime/core/common/cpuid_info.h & onnxruntime/core/common/cpuid_info.cc Previous SME2 kernels integrated were using SME(1) check, this change will correctly distinguish between when SME1 and SME2 kernels are to be used. Bumping KleidiAI version to 1.10.0 **Indicative performance data** Single thread Mac Mini M4 runs on various models using: onnxruntime_perf_test -v -e cpu -I -m times -x 1 -y 1 -r 1 <img width="785" height="400" alt="image" src="https://github.com/user-attachments/assets/37c0b271-14fb-4b76-b2a0-28c5dd9308aa" /> **Next steps** Additional commits to come will address outstanding to-do issues from previous PR linked below: [ KleidiAI SGEMM/IGEMM/Quantized MatMul - Modular MLAS API Changes for KleidiAI #25187](https://github.com/microsoft/onnxruntime/pull/25187) Signed-off-by: Patryk Kaiser <patryk.kaiser@arm.com>
Parents
Loading