[PyTorch Edge][QNNPack] Depthwise Conv3d mp8x27 (per-channel) Sse2 Kernel (#69314)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69314
Implementation based off of [convolution-operator-tester.h](https://www.internalfb.com/code/fbsource/[679135d62c0a64e3d0fa0c830aa062ac28f292b8]/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/test/convolution-operator-tester.h)
Generated files (caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/wrappers/q8dwconv/*) made with
- cd caffe2/aten/src/ATen/native/quantized/cpu/qnnpack
- python3 generate-wrapper.py
The math used the compute the ```w_zyxc_ptr``` is explained here:
{F681213069}
ghstack-source-id: 146346784
Test Plan: Test when used in depthwise conv3d later in this diff stack (D31966574)
Reviewed By: kimishpatel
Differential Revision: D32261231
fbshipit-source-id: 8e793696f7c3b0e7cceda88df8099f64f3c69ac4