[MLAS] Simplify & optimize Arm64 NCHWc Convolution kernels #26691
Simplify the Fp32 Pointwise Kernel to use GEMM
a26554ad
Simplify the Fp32 Depthwise kernel
ce9b1734
Simplify the Fp32 Depthwise Conv kernel
147ad53e
Make fp32 Depthwise Conv branchless
aae9b94d
Make fp32 Depthwise Conv branchless
5f3789ae
Make MlasConvFloatKernelNeonImpl branchless
2bb2f4bd
Remove redundant code
d07c57ac
Merge two loops
314cef5e
Make ReLU branchless in Fp32 Pointwise Conv
0efd3480
Merge branch 'microsoft:main' into simplify-fp32-conv
eac67e1a
Fix the GEMM implementation and expand the test coverage.
a0d660c6
Fix segfault in ConvNoBiasAddFusion
7e421be0
Eliminate potential segfault
1d7d1deb
hariharans29
changed the title Simplify & optimize Arm64 NCHWc Convolution kernels [MLAS] Simplify & optimize Arm64 NCHWc Convolution kernels 184 days ago
Copilot's suggestion for boundary checks
6dbf2020
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub