[PyTorch Edge][QNNPack] Depthwise Conv3d mp8x27 (per channel) Neon Kernel (#69313)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69313
Allows for depthwise conv3d with 3x3x3 kernel
Implementation based heavily off of [mp8x25-neon-per-channel.c](https://www.internalfb.com/code/fbsource/[679135d62c0a64e3d0fa0c830aa062ac28f292b8]/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8dwconv/mp8x25-neon-per-channel.c) (depthwise conv2d with 5x5 kernel)
This supports per-channel convolution, but it works for non per-channel too
Generated files (caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/wrappers/q8dwconv/*) made with
- cd caffe2/aten/src/ATen/native/quantized/cpu/qnnpack
- python3 generate-wrapper.py
ghstack-source-id: 146346785
Test Plan: Test when used in depthwise conv3d later in this diff stack (D31966574)
Reviewed By: kimishpatel
Differential Revision: D32074096
fbshipit-source-id: 8111926df6ecb89d88ca810deeab87b1c072f55a