onnxruntime
3bb9e955 - [MLAS][KleidiAI]Catlaw01/sgemm epilogue neon opt (#27609)

Commit
78 days ago
[MLAS][KleidiAI]Catlaw01/sgemm epilogue neon opt (#27609) ### Description This change updates the KleidiAI SGEMM post-processing path in onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp with two parts: - Correctness fix: in the alpha == 0 || K == 0 fast path, beta handling is now applied for every batch entry (not just batch 0), so batched SGEMM behaviour is correct. - NEON SGEMM epilogue optimisation: adds a vectorised alpha/beta post-processing path for contiguous outputs, with guarded fallback to scalar for non-contiguous or small cases. The 2D epilogue path also routes contiguous tiles through the contiguous 1D epilogue path to enable vectorisation. ### Motivation and Context This change addresses correctness and performance in the SGEMM post-processing stage: - The batched alpha == 0 || K == 0 path previously used only Data[0], which could produce incorrect results for BatchSize > 1. - The post-processing loop (C = alpha * (A*B) + beta * C) is a known latency contributor when memcpy fast paths are not applicable. The NEON epilogue changes are intended to reduce this cost on supported ARM platforms while preserving existing fallback behaviour. --------- Signed-off-by: Cathal Lawlor cathal.lawlor@arm.com Signed-off-by: Cathal Lawlor <cathal.lawlor@arm.com>
Author
Parents
Loading