Add 16bit and 8bit row/col indices q8gemm sparse kernels (#85245)
TLDR: see D39003528 to see the actual changes in this diff more clearly, which will make reviewing easier
___
The 32bit versions were changed to be created with a macros which are also used to create 16bit and 8bit versions
This diff shows that almost all of the lines in the .s files were modified, but most changes are just adding spaces to the front and ;/ to the end so they can be contained in the macro. To generate these changes, I first wrote the macros without the spaces and ;/, and then I ran a script (see the python file in D39003528) to get the final version.
To review this diff more easily, if you want to see the code changes before I ran the script, which makes it much easier to see which lines were changed, see D39003528.
Each version of this diff is synched with the same number version of that diff (so if I change this diff I will mirror the changes to the same version on that diff)
Differential Revision: [D39003527](https://our.internmc.facebook.com/intern/diff/D39003527/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85245
Approved by: https://github.com/kimishpatel