[quant][core][performance] Changed cudnn quantized conv2d impl to use inplace operations (#73857)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73857
This PR changed the implementation for the conv2d cudnn operator to use inplace ops.
This increases the quantized conv operator's efficiency when bias and/or relu is used.
Based on discussions, to support inplace operations, unique uids need to be assigned
to the input and output even if it is stored at the same memory address.
e.g., see the different uids in the current implementation assigned to conv_output.data_ptr
Test Plan:
In pytorch main directory, execute
```
python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn
```
for accuracy testing and
```
python test/test_quantization.py TestQuantizedConv.test_benchmark
```
for benchmark testing.
Reviewed By: ezyang
Differential Revision: D34824250
Pulled By: dzdang
fbshipit-source-id: 4d0d2fd61245d4a2cbbdffb910eb73a5807237fd
(cherry picked from commit fe21915492e14d9f97dcfc62dba8e9b237ebdb84)