[quant][gpu][core] Implemented quantized add operator using cudnn [reland PR74463] (#74463)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74463
This PR implements the quantized add operator using cudnn operations.
Also added a corresponding test function in test_quantized_op.py. Ideally,
we should merge this function with the cpu variant, but for now, we will
keep it separate until cudnn v8 is in the default build. Other factors also
complicate the merge as cudnn quantized add is currently only supported for
int8 symmetrically quantized tensors.
Test Plan:
In pytorch main dir, execute
```
python test/test_quantization.py TestQuantizedOps.test_qadd_relu_cudnn
```
Reviewed By: ngimel
Differential Revision: D35218224
Pulled By: dzdang
fbshipit-source-id: a2e57e0b46cff655f2fb77000ea4db3a558a0851
(cherry picked from commit 3c2bae0ac95679a04f2445a4b596784bad556d78)