[quant][core][gpu][improvement] Added support for padding quantized cudnn conv2d operator
Summary:
cudnn v8.4.0 expects input channels for conv2d to be a multiple of 4. If
it is not, we need to explicitly pad it to a multiple of 4 ourselves as
cudnn does not currently support padding intriniscally.
The padding implemented here is limited to groups=1; however, this
should be a straightforward adaption to groups > 1 since we're only
padding a single dimension.
When cudnn enables support for padding, we can remove the padding on our
end.
Test plan:
```
python test/test_quantization.py -k test_qconv2d_cudnn
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76184
Approved by: https://github.com/jerryzh168