Quantized MatMul - Modular MLAS API Changes for KleidiAI (#25187)

Commit

237 days ago

KleidiAI SGEMM/IGEMM/Quantized MatMul - Modular MLAS API Changes for KleidiAI (#25187) This PR introduces the initial integration of KleidiAI-optimized microkernels into ONNX Runtime's MLAS backend, focusing on support for: - SGEMM - IGEMM - Dynamic Quantized MatMuls Key changes: Implements overrides for MlasGemmBatch, MlasGemmPackBSize, and MlasGemmPackB using KleidiAI where applicable. Applies dispatch logic based on TransA == CblasNoTrans and SME2 availability. Supports float32 and int8 GEMM workloads with conditionally invoked SME2 paths. Maintains fallback paths to default MLAS implementations to ensure coverage and stability. **Known Issues / Next Steps:** Requesting feedback specifically on the API structure: Does the new MLAS interface design align with long-term extensibility? Are the dispatch points and override boundaries well-structured? Indicative Performance figures: The kernels added are particularly effective for Conv2D operators: * Based on KleidiAI SME running mobilenet_v1_ssd_f32 on Mac Mini M4 on a single thread <img width="815" height="308" alt="image" src="https://github.com/user-attachments/assets/e39a7fef-1370-4332-83a3-1f3a80b29da4" /> --------- Signed-off-by: Damien Dooley <damien.dooley@arm.com> Co-authored-by: Jonathan Clohessy <jonathan.clohessy@arm.com> Co-authored-by: Declan Flavin <declan.flavin@arm.com> Co-authored-by: Colm Donelan <colm.donelan@arm.com> Co-authored-by: Damien Dooley <damdoo01@ip-10-249-28-46.eu-west-1.compute.internal>

References

#25187 - KleidiAI SGEMM/IGEMM/Quantized MatMul - Modular MLAS API Changes for KleidiAI

Author

damdoo01-arm

Parents

11aebeb2

onnxruntime cd450d15 - KleidiAI SGEMM/IGEMM/Quantized MatMul - Modular MLAS API Changes for KleidiAI (#25187)

onnxruntime
cd450d15 - KleidiAI SGEMM/IGEMM/Quantized MatMul - Modular MLAS API Changes for KleidiAI (#25187)