[QNNPACK, Sparsity] Add 8x1 block sparse kernels for aarch32. (#51119)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51119
Adds asm kernel for 8x1 block sparse kernel. Since ukernels is still
producing 4x8 blocks, similar to 1x4 sparsity pattern, we can use the
same prepacking kernel for activation. It does get a tiny bit hacky but
allows us to reuse the kernel.
Test Plan:
q8gemm-sparse-test
fully-connectest-sparse-test
Imported from OSS
Reviewed By: AshkanAliabadi
Differential Revision: D26077765
fbshipit-source-id: cc087b0ff717a613906d442ea73680e785e0ecc2