[quant][core][gpu][improvement] Enabled broadcasting multiplication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops
Summary:
Previously, requantize_multiplier_tensor was set to be the same size as
quantized_output, as broadcasting by multiplication was not supported in
cudnn. This support has been added in cudnn as of 8.3.3. The requirement
is that requantize_multiplier_tensor still has to be a tensor, but it
can be a scalar tensor with the same number of dimensions as the tensor
it is being multiplied to.
Test Plan:
```
python test/test_quantization.py -k test_qconv2d_cudnn
python test/test_quantization.py -k test_qadd_relu_cudnn
python test/test_quantization.py -k test_qlinear_cudnn
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76518
Approved by: https://github.com/jerryzh168