[Quant][core][gpu][improvement] Refactored implementation for conv2d_cudnn to use packed parameters (#73510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73510
The previous implementation introduced in https://github.com/pytorch/pytorch/pull/70622
and expanded on in https://github.com/pytorch/pytorch/pull/72770,
https://github.com/pytorch/pytorch/pull/73035, https://github.com/pytorch/pytorch/pull/73337
did not make use of packed parameters. This PR refactors the existing
implementation to use packed parameters for cudnn conv2d in the same manner
as was done for qnnpack and fbgemm in the following files:
aten/src/ATen/native/quantized/cpu/fbgemm_utils.h.
aten/src/ATen/native/quantized/cpu/qnnpack_utils.h.
aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp.
aten/src/ATen/native/quantized/cpu/qconv_unpack.cpp (note this file will be
refactored into two files (one located in /quantized/ and the other in
/quantized/cpu/) in a subsequent PR, as we are currently using the dispatch introduced
in this file for the cudnn operator as well)
This allows for all cudnn operators to be registered as quantized::conv2d,
quantized::conv2d_relu, quantized::conv2d_prepack, and to allow the dispatcher
to determine which backend to use (e.g., cuda/cudnn, fbgemm, or qnnpack).
Test cases were also modified to adhere to the methodology of using
prepacking the weight & bias prior to passing it into the conv2d operator.
We also ensured that the refactorization did not result in a reduction in speed
by verifying that the computation times in the benchmark test case (see test plan below)
are consistent with the results pre-refactorization.
Note the following:
apply_impl is now what was formerly raw_cudnn_convolution_forward
apply_impl_helper is now what was formerly raw_cudnn_convolution_forward_out
Test Plan:
In pytorch main directory, execute
```
python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn
```
for accuracy testing and
```
python test/test_quantization.py TestQuantizedConv.test_benchmark
```
for benchmark testing.
In pytorch main directory, execute
```
python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn
```
for accuracy testing and
```
python test/test_quantization.py TestQuantizedConv.test_benchmark
```
for benchmark testing.
Differential Revision:
D34803275
D34803275
Reviewed By: jerryzh168
Pulled By: dzdang
fbshipit-source-id: 299479c0315f41d758ab62125c9e5e7074e372e8
(cherry picked from commit 03d9e68712f0ed5f39aedd9c29589f7f19b3a082)