onnxruntime
[MLAS] Simplify & optimize Arm64 NCHWc Convolution kernels
#26691
Merged

[MLAS] Simplify & optimize Arm64 NCHWc Convolution kernels #26691

Rohanjames1997
Rohanjames1997 Simplify the Fp32 Pointwise Kernel to use GEMM
a26554ad
Rohanjames1997 Simplify the Fp32 Depthwise kernel
ce9b1734
Rohanjames1997 Simplify the Fp32 Depthwise Conv kernel
147ad53e
Rohanjames1997 Make fp32 Depthwise Conv branchless
aae9b94d
Rohanjames1997 Make fp32 Depthwise Conv branchless
5f3789ae
Rohanjames1997 Make MlasConvFloatKernelNeonImpl branchless
2bb2f4bd
Rohanjames1997 Remove redundant code
d07c57ac
Rohanjames1997 Merge two loops
314cef5e
Rohanjames1997
hariharans29
azure-pipelines
hariharans29
Rohanjames1997
Rohanjames1997
Rohanjames1997 Make ReLU branchless in Fp32 Pointwise Conv
0efd3480
hariharans29
azure-pipelines
hariharans29
Rohanjames1997
Rohanjames1997 Merge branch 'microsoft:main' into simplify-fp32-conv
eac67e1a
hariharans29
hariharans29
Rohanjames1997
Rohanjames1997 Fix the GEMM implementation and expand the test coverage.
a0d660c6
Rohanjames1997 Fix segfault in ConvNoBiasAddFusion
7e421be0
Rohanjames1997 Eliminate potential segfault
1d7d1deb
hariharans29
azure-pipelines
hariharans29 hariharans29 changed the title Simplify & optimize Arm64 NCHWc Convolution kernels [MLAS] Simplify & optimize Arm64 NCHWc Convolution kernels 184 days ago
hariharans29 hariharans29 requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 184 days ago
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2025-12-11
hariharans29
Rohanjames1997 Copilot's suggestion for boundary checks
6dbf2020
Rohanjames1997
hariharans29
azure-pipelines
Rohanjames1997
hariharans29
hariharans29 approved these changes on 2025-12-15
hariharans29 hariharans29 merged 84e657a1 into main 180 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone