Support Linear operation with fp16 weights in ATen (#22023)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22023
This diff implements Linear operation with fp16 weights based on FBGEMM. At a hight level, we want to perform the following operation:
Y = X * W + B with dtypes:
(fp32, fp32, fp16, fp32)
To do that, three steps are needed:
1. Quantize weights from fp32 to fp16, this is done using `PackedGemmMatrixFP16` in the `fbgemm_pack_gemm_matrix_fp16`
2. Conduct matrix multiplication with quantized weights using `cblas_gemm_compute` in `fbgemm_linear_fp16_weight`
3. Add bias to the result from step2 and return the final Y
Reviewed By: jianyuh
Differential Revision: D15921768
fbshipit-source-id: dc4e5b366f846ce9d58975876940a9b3372b8b8d