[quant][core][gpu][cudnn] Added support for nhwc tensors in quantized cudnn add_relu op (#75806)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75806
When using the the quantized cudnn add operator, if the input tensors are 4D,
cudnn requires NHWC format in v. 8.4.0 (older versions may have relaxed this constraint).
Previously, all tensors defaulted to NCHW format.
Test Plan:
```
python test/test_quantization.py -k test_qadd_relu_cudnn
```
Reviewed By: vkuzo
Differential Revision: D35651368
Pulled By: dzdang
fbshipit-source-id: b6ce49cf100b88c6fa29513ec50b38d445c3c02f
(cherry picked from commit 5936fe6783a02827bd93feb80d137da508d6facc)