[quant][core][gpu][bug fix] Fixed off by one index issue in broadcasted_bias
Summary:
There was an off by one index issue in new_size (used for broadcasted_bias).
This has now been corrected.
The matrix multiplication's output's last dimension is the number of out features, which is the
size of bias. We create `new_size` for `broadcasted_bias`, whom we want to have the same number
of dimensions as the matmul output and for it have the same size for the last dimension for broadcasting purposes.
Previously, we had `new_size[1] = bias_.value().size(0);`, but this is wrong, in general.
Test plan:
```
python test/test_quantization.py -k test_qlinear_cudnn
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75483
Approved by: https://github.com/jerryzh168