onnxruntime
4013dc19 - Implement multithreading in qgemm_kleidi (#26301)

Commit
109 days ago
Implement multithreading in qgemm_kleidi (#26301) **Key changes** This PR makes changes to improve the performance on Dynamic Qgemms by implementing tiling and threading across operations. The changes introduce thread local buffers for reusing memory during inference. And utilizes those in Dynamic Quantised Matmul operations using Kleidiai kernels. And updating KleidiAI version to 1.15.0 **Example performance** single thread : <img width="2100" height="900" alt="ort_ops_compare_encoder_1_2025-10-02_17-21-32_vs_encoder_1_2025-10-02_16-54-55" src="https://github.com/user-attachments/assets/c23c808d-5fab-4995-997e-a57a66a23d68" /> 2 threads : <img width="2100" height="900" alt="ort_ops_compare_encoder_2_2025-10-02_17-21-47_vs_encoder_2_2025-10-02_16-55-13" src="https://github.com/user-attachments/assets/31a0eb7a-7ff4-40c9-9425-b70231f131e8" /> --------- Signed-off-by: melkap01 <melike.kaptan@arm.com> Signed-off-by: Jonathan Clohessy <jonathan.clohessy@arm.com> Co-authored-by: Damien Dooley <damien.dooley@arm.com> Co-authored-by: Jonathan Clohessy <jonathan.clohessy@arm.com>
Author
Parents
Loading