pytorch
85c790ed - [Quant][core][gpu][improvement] Refactored implementation for conv2d_cudnn to use packed parameters (#73510)

Commit

2 years ago

[Quant][core][gpu][improvement] Refactored implementation for conv2d_cudnn to use packed parameters (#73510) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73510 The previous implementation introduced in https://github.com/pytorch/pytorch/pull/70622 and expanded on in https://github.com/pytorch/pytorch/pull/72770, https://github.com/pytorch/pytorch/pull/73035, https://github.com/pytorch/pytorch/pull/73337 did not make use of packed parameters. This PR refactors the existing implementation to use packed parameters for cudnn conv2d in the same manner as was done for qnnpack and fbgemm in the following files: aten/src/ATen/native/quantized/cpu/fbgemm_utils.h. aten/src/ATen/native/quantized/cpu/qnnpack_utils.h. aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp. aten/src/ATen/native/quantized/cpu/qconv_unpack.cpp (note this file will be refactored into two files (one located in /quantized/ and the other in /quantized/cpu/) in a subsequent PR, as we are currently using the dispatch introduced in this file for the cudnn operator as well) This allows for all cudnn operators to be registered as quantized::conv2d, quantized::conv2d_relu, quantized::conv2d_prepack, and to allow the dispatcher to determine which backend to use (e.g., cuda/cudnn, fbgemm, or qnnpack). Test cases were also modified to adhere to the methodology of using prepacking the weight & bias prior to passing it into the conv2d operator. We also ensured that the refactorization did not result in a reduction in speed by verifying that the computation times in the benchmark test case (see test plan below) are consistent with the results pre-refactorization. Note the following: apply_impl is now what was formerly raw_cudnn_convolution_forward apply_impl_helper is now what was formerly raw_cudnn_convolution_forward_out Test Plan: In pytorch main directory, execute ``` python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn ``` for accuracy testing and ``` python test/test_quantization.py TestQuantizedConv.test_benchmark ``` for benchmark testing. In pytorch main directory, execute ``` python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn ``` for accuracy testing and ``` python test/test_quantization.py TestQuantizedConv.test_benchmark ``` for benchmark testing. Differential Revision: D34803275 D34803275 Reviewed By: jerryzh168 Pulled By: dzdang fbshipit-source-id: 299479c0315f41d758ab62125c9e5e7074e372e8 (cherry picked from commit 03d9e68712f0ed5f39aedd9c29589f7f19b3a082)

References

#74332 - Merge master into lazy_tensor_staging

Author

dzdang

Committer

pytorchmergebot

Parents

ef9023e9

pytorch 85c790ed - [Quant][core][gpu][improvement] Refactored implementation for conv2d_cudnn to use packed parameters (#73510)

pytorch
85c790ed - [Quant][core][gpu][improvement] Refactored implementation for conv2d_cudnn to use packed parameters (#73510)