Quant: add weight int4pack mm kernel (#110914)
Adding the weight int4pack mm CUDA kernel. The kernel comes from the tinnygemm project which developed by Jeff Johnson.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110914
Approved by: https://github.com/Chillee