pytorch
ea833a5d - [quant][gpu][core] Added quantized linear operator in cudnn (#73959)

Commit
3 years ago
[quant][gpu][core] Added quantized linear operator in cudnn (#73959) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73959 This PR is similar to https://github.com/pytorch/pytorch/pull/70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test Plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` Imported from OSS Differential Revision: D34824251 D34824251 Reviewed By: jerryzh168 Pulled By: dzdang fbshipit-source-id: 47139796782ade8d030ba2f9968a9abdd3a91d2f (cherry picked from commit eade369f608e2bb59b734a45aca5f8257f07d6b2)
Author
Committer
Parents
Loading