pytorch
6dcbf396 - [QNNPACK, Sparsity] Added prepacking base aarch32 kernels (#50589)

Commit
3 years ago
[QNNPACK, Sparsity] Added prepacking base aarch32 kernels (#50589) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50589 Adds 1. Input prepacking kernel 2. Compute kernels that processes prepacked activation. Hunch is that input prepacking will help with 1. Cache locality and 2. Avoid a lot of address compute instructions. Cache locality helps mainly comes from the fact that we are doing mr=8 and nr=4. mr being 8 likely results in cache line evictions as likely cache associativity is 4. Laying out transposed activations which are blocked by mr=8 will lay all the transposed activation in one contiguous block. Downside is that now we will tranpose all the blocks regardless of them participating in compute. However it is likely that entire activation matrix participates in compute for some output block. Also add benchmark Test Plan: q8gemm-sparse-test fully-connected-test-sparse Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D25925502 fbshipit-source-id: b2c36419a2c5d23b4a49f25f9ee41cee8397c3be
Author
Parents
Loading