[quant][gpu][core] Implemented quantized add operator using cudnn (#74463)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74463
This PR implements the quantized add operator using cudnn operations.
Also added a corresponding test function in test_quantized_op.py. Ideally,
we should merge this function with the cpu variant, but for now, we will
keep it separate until cudnn v8 is in the default build. Other factors also
complicate the merge as cudnn quantized add is currently only supported for
int8 symmetrically quantized tensors.
Test Plan:
In pytorch main dir, execute
```
python test/test_quantization.py TestQuantizedOps.test_qadd_relu_cudnn
```
TBA
Differential Revision:
D35009111
D35009111
Reviewed By: jerryzh168
Pulled By: dzdang
fbshipit-source-id: 13afa7f0192ffaf1f36334b1af827202c7dd0f74
(cherry picked from commit 2b5759523e2fec5c849941552b904e412d67a138)