[quant][gpu][core] Added quantized linear operator in cudnn (#73959)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73959
This PR is similar to https://github.com/pytorch/pytorch/pull/70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.
Test Plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```
Imported from OSS
Differential Revision:
D34824251
D34824251
Reviewed By: jerryzh168
Pulled By: dzdang
fbshipit-source-id: 47139796782ade8d030ba2f9968a9abdd3a91d2f
(cherry picked from commit eade369f608e2bb59b734a45aca5f8257f07d6b2)