[quant][core][gpu][improvement] Suported int8 matmul for quantized linear cudnn op
Summary:
This PR requires cudnn v8.4.0, which enables support for int8 matmul.
Previous implementation of quantized linear cudnn operator did used cudnn v8.3.3,
which did not have have support for int8 matmul (we had to convert our int8 matmul to fp matmul)
Test plan:
```
python test/test_quantization.py -k test_qlinear_cudnn
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75418
Approved by: https://github.com/jerryzh168