[quant][core][gpu][bux fix] Added clone and contiguous() to broadcasted_bias tensor in quantized cudnn linear op
Summary:
The previous implementation for broadcasted_bias in quantized cudnn
linear op has 2 issues.
1) broadcasted_bias is a view of the the input bias tensor. This is not
desired as any modifications to broadcasted_bias is also done to the
input bias. To remedy this, we clone the input bias tensor.
2) Calling broadcast_to doesn't affect the storage, which is problematic
for the cudnn operations. We need to create a fully broadcasted tensor,
rather than a view (which is what's returned by broadcast_to). To remedy
this, we call contiguous().
Test plan:
python test/test_quantization.py -k test_linear_cudnn
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75944
Approved by: https://github.com/jerryzh168