[sparsity] Add m-out-of-n support in the WeightNormSparsifier (#65295)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65295
The m-out-of-n is implemented as follows:
1. Compute the blocks that need to be sparsified using the weight-norm criterion
2. Within each block below the threshold find the smallest absolute value elements
3. Zero out only the smallest values within each block
m-out-of-n describes sparsification scheme where in a block with "n" elements, only "m" of them would be zeroed-out.
Block sparsity, with the whole block being all zeros, is a special case of m-out-n: If m==n, the whole block is reset.
This echoes the implementation described in the https://github.com/pytorch/pytorch/issues/59835,
as well as meets the support of the nVidia cusparselt requirements.
To support the CUDA sparsity (2/4), one would need to set the sparsity_level to 1.0.
That translates to all blocks of shape 1x4 within a tensor will sprasify with 2-out-4 scheme.
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D31186828
Pulled By: z-a-f
fbshipit-source-id: 7bd3e2707915b90f4831859781fc6e25f716c618