pytorch
70830b5a - [QNNPACK, Sparsity] Sparse kernel with 4x8 blocking (#50590)

Commit
3 years ago
[QNNPACK, Sparsity] Sparse kernel with 4x8 blocking (#50590) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50590 Larger blocking across M dim such as 8 in previous PR is likely introducing wasted compute on the shapes being benchmarked. Here we introduced 4x8 blocking of mrxnr. This helps 1) in packing smaller data for small values of M and 2) for compute kernel it writes same number of bytes but more contiguously. It is not certain but it likely helps. Test Plan: q8gemm-sparse-test fully-connected-sparse-test Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D25925499 fbshipit-source-id: 01c661ceea38bd6ee8321bb85cf1d5da5de4e984
Author
Parents
Loading