Added per channel kernels for depthwise conv. (#37621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37621
Due to potential perf issues with using same depthwise conv kernels for
perf channel depthwise conv, we opt for replicating the kernels and
adding per channel support to them.
Note that the large kernels files are largely duplication of original
kernels. Assembly kernels have little more modifications than intrinsics
ones.
Test Plan:
qnnpack tests.
q8dwconv-test
convolution-test
Differential Revision: D21339042
Pulled By: kimishpatel
fbshipit-source-id: f2c3413e1e1af0b1f89770b5e0f66f402d38aee8