onnxruntime
36cbdb49 - [MLAS] Removed memcpy step by storing result in C if possible (#27367)

Commit
39 days ago
[MLAS] Removed memcpy step by storing result in C if possible (#27367) <h2 data-sourcepos="1:1-1:10" dir="auto">Summary</h2> <p data-sourcepos="2:1-2:68" dir="auto">This change removes the memcpy step in sgemm_kleidiai where possible by writing directly to C</p> <h2 data-sourcepos="4:1-4:10" dir="auto"> <a href="#testing" aria-hidden="true" class="anchor" id="user-content-testing"></a>Testing</h2> Model | Baseline avg (ms) | Current avg (ms) | Δ ms | Δ % -- | -- | -- | -- | -- Transformer_complex_f32.onnx | 2.929885 | 2.701083 | -0.228802 | -7.81% bert_tiny_f32.onnx | 0.279675 | 0.273928 | -0.005747 | -2.05% de_efficientnetlitev3_f32.onnx | 80.038132 | 78.560747 | -1.477385 | -1.85% deeplabv3_mobilenetv2_f32.onnx | 48.565125 | 46.446841 | -2.118284 | -4.36% imagetransformnet_f32.onnx | 303.835868 | 302.553625 | -1.282243 | -0.42% mobilenet_v1_f32.onnx | 4.379468 | 4.163018 | -0.216450 | -4.94% mobilenetv1_ssd_f32.onnx | 9.245055 | 8.881198 | -0.363857 | -3.94% openposev2_vgg19_f32.onnx | 210.981128 | 209.199398 | -1.781730 | -0.84% retinaface_f32.onnx | 42.326391 | 38.454346 | -3.872045 | -9.15% rfdn_f32.onnx | 13.929565 | 13.679875 | -0.249690 | -1.79% Signed-off-by: Jonathan Clohessy <Jonathan.Clohessy@arm.com>
Author
Parents
Loading