Added antialias flag to interpolate (CUDA, bilinear and bicubic) (#70930)
Summary:
Description:
- Added antialias flag to interpolate (CUDA)
- forward and backward for bicubic mode
- added tests
Previous PR for CPU bilinear, https://github.com/pytorch/pytorch/pull/65142
Previous PR for CPU bicubic, https://github.com/pytorch/pytorch/pull/68819
### Benchmarks
<details>
<summary>
Bilinear forward pass, PIL, PTH CPU and PTH CUDA
</summary>
Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112
```
Torch version: 1.11.0a0+gitd032369
Torch config: PyTorch built with:
- GCC 9.3
- C++ Version: 201402
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- CPU capability usage: AVX2
- CUDA Runtime 11.1
- NVCC architecture flags: -gencode;arch=compute_61,code=sm_61
- CuDNN 8.0.5
- Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF,
Num threads: 8
[----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (320, 196) -----------------------------------]
| Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------------------------------------------
channels_first contiguous torch.float32 | 2851.2 | 874.1 | 57.1
channels_last non-contiguous torch.float32 | 2856.1 | 1155.8 | 130.6
Times are in microseconds (us).
[----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (460, 220) -----------------------------------]
| Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------------------------------------------
channels_first contiguous torch.float32 | 3705.9 | 1005.8 | 66.3
channels_last non-contiguous torch.float32 | 3742.9 | 1332.8 | 143.5
Times are in microseconds (us).
[------------------------------------ Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 96) -----------------------------------]
| Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------------------------------------------
channels_first contiguous torch.float32 | 1768.0 | 725.2 | 77.9
channels_last non-contiguous torch.float32 | 1753.7 | 942.5 | 144.0
Times are in microseconds (us).
[----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (1200, 196) ----------------------------------]
| Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------------------------------------------
channels_first contiguous torch.float32 | 9522.6 | 2593.8 | 157.8
channels_last non-contiguous torch.float32 | 9513.5 | 3622.7 | 241.5
Times are in microseconds (us).
[----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 1200) ----------------------------------]
| Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------------------------------------------
channels_first contiguous torch.float32 | 2240.1 | 565.5 | 93.3
channels_last non-contiguous torch.float32 | 2244.2 | 972.7 | 170.8
Times are in microseconds (us).
[------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (320, 196) --------------------------]
| Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ---------------------------------------------------------------------------------------------------------------
contiguous torch.float32 | 1441.3 | 386.1 | 22.3
Times are in microseconds (us).
[------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (460, 220) --------------------------]
| Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ---------------------------------------------------------------------------------------------------------------
contiguous torch.float32 | 1815.2 | 376.8 | 27.8
Times are in microseconds (us).
[-------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 96) --------------------------]
| Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ---------------------------------------------------------------------------------------------------------------
contiguous torch.float32 | 962.3 | 400.0 | 29.4
Times are in microseconds (us).
[------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (1200, 196) -------------------------]
| Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ---------------------------------------------------------------------------------------------------------------
contiguous torch.float32 | 4749.7 | 910.1 | 63.7
Times are in microseconds (us).
[------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 1200) -------------------------]
| Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ---------------------------------------------------------------------------------------------------------------
contiguous torch.float32 | 1098.1 | 272.0 | 36.4
Times are in microseconds (us).
```
</details>
<details>
<summary>
Bicubic forward pass, PIL, PTH CPU and PTH CUDA
</summary>
Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112
```
Torch version: 1.11.0a0+gitd032369
Torch config: PyTorch built with:
- GCC 9.3
- C++ Version: 201402
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- CPU capability usage: AVX2
- CUDA Runtime 11.1
- NVCC architecture flags: -gencode;arch=compute_61,code=sm_61
- CuDNN 8.0.5
- Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF,
Num threads: 8
[------------------------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (320, 196) -----------------------------------]
| Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------------------------------------------
channels_first contiguous torch.float32 | 4522.4 | 1406.7 | 170.3
channels_last non-contiguous torch.float32 | 4530.0 | 1435.4 | 242.2
Times are in microseconds (us).
[------------------------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (460, 220) -----------------------------------]
| Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------------------------------------------
channels_first contiguous torch.float32 | 5726.4 | 1628.6 | 164.0
channels_last non-contiguous torch.float32 | 5722.6 | 1665.6 | 234.7
Times are in microseconds (us).
[------------------------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 96) ------------------------------------]
| Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------------------------------------------
channels_first contiguous torch.float32 | 2909.1 | 1461.5 | 276.9
channels_last non-contiguous torch.float32 | 2892.9 | 1458.7 | 345.1
Times are in microseconds (us).
[----------------------------------- Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (1200, 196) -----------------------------------]
| Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------------------------------------------
channels_first contiguous torch.float32 | 14699.2 | 4283.9 | 407.1
channels_last non-contiguous torch.float32 | 14711.3 | 4321.1 | 477.0
Times are in microseconds (us).
[----------------------------------- Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 1200) -----------------------------------]
| Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------------------------------------------
channels_first contiguous torch.float32 | 3467.0 | 980.0 | 339.2
channels_last non-contiguous torch.float32 | 3465.2 | 982.3 | 407.8
Times are in microseconds (us).
[-------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (320, 196) --------------------------]
| Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ---------------------------------------------------------------------------------------------------------------
contiguous torch.float32 | 2396.7 | 877.8 | 68.1
Times are in microseconds (us).
[-------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (460, 220) --------------------------]
| Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ---------------------------------------------------------------------------------------------------------------
contiguous torch.float32 | 3068.2 | 777.3 | 64.7
Times are in microseconds (us).
[-------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 96) ---------------------------]
| Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ---------------------------------------------------------------------------------------------------------------
contiguous torch.float32 | 1540.2 | 829.3 | 100.4
Times are in microseconds (us).
[------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (1200, 196) --------------------------]
| Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ---------------------------------------------------------------------------------------------------------------
contiguous torch.float32 | 7919.5 | 1467.8 | 151.6
Times are in microseconds (us).
[------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 1200) --------------------------]
| Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ---------------------------------------------------------------------------------------------------------------
contiguous torch.float32 | 1695.7 | 631.2 | 117.7
Times are in microseconds (us).
```
</details>
<details>
<summary>
Bilinear backward pass, PTH CPU and PTH CUDA
</summary>
Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112
```
- Measure only backward op
Torch version: 1.11.0a0+gitd032369
Torch config: PyTorch built with:
- GCC 9.3
- C++ Version: 201402
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- CPU capability usage: AVX2
- CUDA Runtime 11.1
- NVCC architecture flags: -gencode;arch=compute_61,code=sm_61
- CuDNN 8.0.5
- Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF,
Num threads: 8
[------------- Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (320, 196) ------------]
| 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------
channels_first contiguous torch.float32 | 4686.8 | 215.7
channels_last non-contiguous torch.float32 | 5101.1 | 220.5
Times are in microseconds (us).
[------------- Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (460, 220) ------------]
| 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------
channels_first contiguous torch.float32 | 6011.2 | 204.4
channels_last non-contiguous torch.float32 | 6396.0 | 210.0
Times are in microseconds (us).
[------------- Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 96) -------------]
| 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------
channels_first contiguous torch.float32 | 2035.6 | 250.2
channels_last non-contiguous torch.float32 | 1589.6 | 252.5
Times are in microseconds (us).
[------------ Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (1200, 196) ------------]
| 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------
channels_first contiguous torch.float32 | 11392.5 | 256.5
channels_last non-contiguous torch.float32 | 11640.2 | 263.9
Times are in microseconds (us).
[------------ Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 1200) ------------]
| 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------
channels_first contiguous torch.float32 | 11769.6 | 465.9
channels_last non-contiguous torch.float32 | 12407.0 | 474.4
Times are in microseconds (us).
[---- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (320, 196) ----]
| 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: -----------------------------------------------------------------------------
contiguous torch.float32 | 3931.0 | 133.3
Times are in microseconds (us).
[---- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (460, 220) ----]
| 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: -----------------------------------------------------------------------------
contiguous torch.float32 | 5594.8 | 133.9
Times are in microseconds (us).
[---- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 96) -----]
| 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: -----------------------------------------------------------------------------
contiguous torch.float32 | 1272.6 | 133.0
Times are in microseconds (us).
[--- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (1200, 196) ----]
| 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: -----------------------------------------------------------------------------
contiguous torch.float32 | 10618.1 | 134.0
Times are in microseconds (us).
[--- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 1200) ----]
| 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: -----------------------------------------------------------------------------
contiguous torch.float32 | 11082.2 | 154.6
Times are in microseconds (us).
```
</details>
<details>
<summary>
Bicubic backward pass, PTH CPU and PTH CUDA
</summary>
Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112
```
- Measure only backward op
Torch version: 1.11.0a0+gitd032369
Torch config: PyTorch built with:
- GCC 9.3
- C++ Version: 201402
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- CPU capability usage: AVX2
- CUDA Runtime 11.1
- NVCC architecture flags: -gencode;arch=compute_61,code=sm_61
- CuDNN 8.0.5
- Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF,
Num threads: 8
[------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (320, 196) -------------]
| 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------
channels_first contiguous torch.float32 | 6791.2 | 618.9
channels_last non-contiguous torch.float32 | 7125.2 | 622.9
Times are in microseconds (us).
[------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (460, 220) -------------]
| 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------
channels_first contiguous torch.float32 | 8806.2 | 600.3
channels_last non-contiguous torch.float32 | 9167.6 | 607.5
Times are in microseconds (us).
[-------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 96) -------------]
| 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------
channels_first contiguous torch.float32 | 3683.6 | 693.8
channels_last non-contiguous torch.float32 | 3617.4 | 695.0
Times are in microseconds (us).
[------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (1200, 196) ------------]
| 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------
channels_first contiguous torch.float32 | 17548.2 | 779.4
channels_last non-contiguous torch.float32 | 17966.2 | 786.5
Times are in microseconds (us).
[------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 1200) ------------]
| 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------
channels_first contiguous torch.float32 | 28.4 | 1.6
channels_last non-contiguous torch.float32 | 28.4 | 1.6
Times are in milliseconds (ms).
[---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (320, 196) -----]
| 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: -----------------------------------------------------------------------------
contiguous torch.float32 | 6266.1 | 208.5
Times are in microseconds (us).
[---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (460, 220) -----]
| 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: -----------------------------------------------------------------------------
contiguous torch.float32 | 8218.3 | 200.8
Times are in microseconds (us).
[----- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 96) -----]
| 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: -----------------------------------------------------------------------------
contiguous torch.float32 | 3458.9 | 231.9
Times are in microseconds (us).
[---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (1200, 196) ----]
| 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: -----------------------------------------------------------------------------
contiguous torch.float32 | 15729.3 | 261.6
Times are in microseconds (us).
[---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 1200) ----]
| 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: -----------------------------------------------------------------------------
contiguous torch.float32 | 26279.8 | 547.0
Times are in microseconds (us).
```
</details>
Code is moved from torchvision: https://github.com/pytorch/vision/pull/4211 and optimized
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70930
Reviewed By: zou3519
Differential Revision: D33817902
Pulled By: jbschlosser
fbshipit-source-id: d63a620f8972ff36b63841f0bc6c820466f58f69
(cherry picked from commit d358cfdb7d1a9efb4b9254446cb9b1d0af35a26d)