Optimized generic interpolation using TensorIterator (keeps original 2d/3d channels last impl) (#54500)
Summary:
Related to https://github.com/pytorch/pytorch/issues/10482
A follow-up PR to https://github.com/pytorch/pytorch/pull/51653/
Description:
- Replaces nearest/linear/cubic implementations with generic interpolation implementation
- Retains 2d/3d channels last implementation due to perf slowdown for 1 thread (see below appendix note)
Speed-ups for cases:
- upsample_nearest channels first
- upsample_bicubic channels first/last
### Results for this PR
<details>
<summary>
Benchmark results between 8518b0e (master) and 73137d8 (this PR)
</summary>
```
Description:
- 20210331-092940_pth_nightly_results_1.9.0a0+git8518b0e.6
- 20210331-092940_pth_nightly_results_1.9.0a0+git8518b0e.1
- 20210331-092940_pr_results_1.9.0a0+git73137d8.6
- 20210331-092940_pr_results_1.9.0a0+git73137d8.1
[---------- upsample_bilinear2d channels_first contiguous torch.float32 ----------]
| 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8
1 threads: ------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 331.8 | 334.6
[1, 3, 320, 320] -> (512, 512) | 1261.7 | 1271.5
[32, 128, 64, 64] -> (32, 32) | 10164.6 | 10251.4
[32, 128, 64, 64] -> (128, 128) | 195966.1 | 197141.8
[1, 3, 500, 500] -> (256, 256) | 347.7 | 348.3
[1, 3, 500, 500] -> (800, 800) | 3044.9 | 3071.4
6 threads: ------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 76.1 | 77.0
[1, 3, 320, 320] -> (512, 512) | 244.8 | 247.6
[32, 128, 64, 64] -> (32, 32) | 2329.4 | 2315.8
[32, 128, 64, 64] -> (128, 128) | 47855.3 | 49047.7
[1, 3, 500, 500] -> (256, 256) | 78.1 | 78.7
[1, 3, 500, 500] -> (800, 800) | 569.3 | 575.6
Times are in microseconds (us).
[------- upsample_bilinear2d channels_first non-contiguous torch.float32 --------]
| 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8
1 threads: -----------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 339.0 | 340.3
[1, 3, 320, 320] -> (512, 512) | 1266.1 | 1277.3
[1, 3, 500, 500] -> (256, 256) | 348.8 | 351.3
[1, 3, 500, 500] -> (800, 800) | 3054.5 | 3077.3
6 threads: -----------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 76.6 | 77.4
[1, 3, 320, 320] -> (512, 512) | 246.0 | 248.1
[1, 3, 500, 500] -> (256, 256) | 78.3 | 79.5
[1, 3, 500, 500] -> (800, 800) | 572.2 | 580.0
Times are in microseconds (us).
[--------- upsample_bilinear2d channels_last non-contiguous torch.float32 --------]
| 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8
1 threads: ------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 965.4 | 964.9
[1, 3, 320, 320] -> (512, 512) | 3856.2 | 3866.8
[32, 128, 64, 64] -> (32, 32) | 5808.3 | 5812.8
[32, 128, 64, 64] -> (128, 128) | 99575.2 | 97226.2
[2, 128, 64, 46] -> (32, 32) | 110.5 | 109.0
[2, 128, 64, 46] -> (128, 128) | 1662.3 | 1612.0
[1, 128, 64, 46] -> (32, 32) | 55.6 | 55.5
[1, 128, 64, 46] -> (128, 128) | 467.0 | 463.9
[1, 3, 500, 500] -> (256, 256) | 967.7 | 966.7
[1, 3, 500, 500] -> (800, 800) | 9394.7 | 9436.6
6 threads: ------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 962.2 | 965.4
[1, 3, 320, 320] -> (512, 512) | 3844.3 | 3844.3
[32, 128, 64, 64] -> (32, 32) | 2270.0 | 2267.6
[32, 128, 64, 64] -> (128, 128) | 31909.7 | 32106.5
[2, 128, 64, 46] -> (32, 32) | 61.3 | 59.9
[2, 128, 64, 46] -> (128, 128) | 912.3 | 893.5
[1, 128, 64, 46] -> (32, 32) | 55.5 | 55.3
[1, 128, 64, 46] -> (128, 128) | 467.0 | 466.4
[1, 3, 500, 500] -> (256, 256) | 967.2 | 971.1
[1, 3, 500, 500] -> (800, 800) | 9383.2 | 9417.4
Times are in microseconds (us).
[------ upsample_linear1d channels_first contiguous torch.float32 -------]
| 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8
1 threads: ---------------------------------------------------------------
[4, 512, 320] -> [256] | 513.5 | 521.8
[4, 512, 320] -> [512] | 999.0 | 1011.8
6 threads: ---------------------------------------------------------------
[4, 512, 320] -> [256] | 103.7 | 104.9
[4, 512, 320] -> [512] | 192.2 | 194.9
Times are in microseconds (us).
[------------- upsample_trilinear3d channels_first contiguous torch.float32 -------------]
| 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8
1 threads: -------------------------------------------------------------------------------
[1, 3, 16, 320, 320] -> [8, 256, 256] | 5.4 | 5.5
[1, 3, 16, 320, 320] -> [32, 512, 512] | 111.2 | 111.1
6 threads: -------------------------------------------------------------------------------
[1, 3, 16, 320, 320] -> [8, 256, 256] | 1.1 | 1.0
[1, 3, 16, 320, 320] -> [32, 512, 512] | 23.4 | 23.2
Times are in milliseconds (ms).
[----------- upsample_trilinear3d channels_last non-contiguous torch.float32 ------------]
| 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8
1 threads: -------------------------------------------------------------------------------
[1, 3, 16, 320, 320] -> [8, 256, 256] | 13521.9 | 12939.9
[1, 3, 16, 320, 320] -> [32, 512, 512] | 244561.3 | 236595.6
[1, 16, 32, 64, 64] -> [16, 32, 32] | 362.2 | 365.5
[1, 16, 32, 64, 64] -> [64, 128, 128] | 38141.4 | 37957.7
6 threads: -------------------------------------------------------------------------------
[1, 3, 16, 320, 320] -> [8, 256, 256] | 12980.4 | 12962.7
[1, 3, 16, 320, 320] -> [32, 512, 512] | 236256.4 | 236364.5
[1, 16, 32, 64, 64] -> [16, 32, 32] | 367.9 | 393.2
[1, 16, 32, 64, 64] -> [64, 128, 128] | 38222.5 | 38198.3
Times are in microseconds (us).
[----------- upsample_nearest2d channels_first contiguous torch.float32 ----------]
| 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8
1 threads: ------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 1205.7 | 107.2
[1, 3, 320, 320] -> (512, 512) | 4793.5 | 357.7
[32, 128, 64, 64] -> (32, 32) | 26550.0 | 6227.1
[32, 128, 64, 64] -> (128, 128) | 341140.3 | 116404.4
[1, 3, 500, 500] -> (256, 256) | 1208.6 | 122.9
[1, 3, 500, 500] -> (800, 800) | 11648.0 | 848.1
6 threads: ------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 220.5 | 32.6
[1, 3, 320, 320] -> (512, 512) | 865.4 | 78.1
[32, 128, 64, 64] -> (32, 32) | 4890.9 | 2201.2
[32, 128, 64, 64] -> (128, 128) | 73533.8 | 32315.4
[1, 3, 500, 500] -> (256, 256) | 222.3 | 35.0
[1, 3, 500, 500] -> (800, 800) | 2107.5 | 170.7
Times are in microseconds (us).
[----------- upsample_nearest2d channels_first contiguous torch.uint8 -----------]
| 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8
1 threads: -----------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 1457.0 | 310.7
[1, 3, 320, 320] -> (512, 512) | 5808.0 | 1196.6
[1, 3, 500, 500] -> (256, 256) | 1460.9 | 312.7
[1, 3, 500, 500] -> (800, 800) | 14094.3 | 2903.5
6 threads: -----------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 264.8 | 66.8
[1, 3, 320, 320] -> (512, 512) | 1046.0 | 228.9
[1, 3, 500, 500] -> (256, 256) | 266.0 | 68.0
[1, 3, 500, 500] -> (800, 800) | 2546.6 | 535.8
Times are in microseconds (us).
[-------- upsample_nearest2d channels_first non-contiguous torch.float32 --------]
| 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8
1 threads: -----------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 1284.3 | 109.9
[1, 3, 320, 320] -> (512, 512) | 4870.0 | 361.6
[1, 3, 500, 500] -> (256, 256) | 1482.8 | 123.3
[1, 3, 500, 500] -> (800, 800) | 12050.3 | 858.8
6 threads: -----------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 240.2 | 32.8
[1, 3, 320, 320] -> (512, 512) | 886.1 | 78.4
[1, 3, 500, 500] -> (256, 256) | 274.9 | 34.9
[1, 3, 500, 500] -> (800, 800) | 2188.8 | 174.0
Times are in microseconds (us).
[--------- upsample_nearest2d channels_first non-contiguous torch.uint8 ---------]
| 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8
1 threads: -----------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 1501.9 | 312.2
[1, 3, 320, 320] -> (512, 512) | 5853.4 | 1202.1
[1, 3, 500, 500] -> (256, 256) | 1574.0 | 313.9
[1, 3, 500, 500] -> (800, 800) | 14210.2 | 2904.5
6 threads: -----------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 277.2 | 67.2
[1, 3, 320, 320] -> (512, 512) | 1059.8 | 228.9
[1, 3, 500, 500] -> (256, 256) | 292.2 | 68.1
[1, 3, 500, 500] -> (800, 800) | 2574.4 | 536.2
Times are in microseconds (us).
[--------- upsample_nearest2d channels_last non-contiguous torch.float32 ---------]
| 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8
1 threads: ------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 746.0 | 751.1
[1, 3, 320, 320] -> (512, 512) | 2967.6 | 2979.2
[32, 128, 64, 64] -> (32, 32) | 3408.5 | 3379.0
[32, 128, 64, 64] -> (128, 128) | 90166.4 | 90023.0
[2, 128, 64, 46] -> (32, 32) | 74.8 | 74.5
[2, 128, 64, 46] -> (128, 128) | 1591.2 | 1594.3
[1, 128, 64, 46] -> (32, 32) | 39.3 | 39.2
[1, 128, 64, 46] -> (128, 128) | 420.3 | 419.1
[1, 3, 500, 500] -> (256, 256) | 751.6 | 756.3
[1, 3, 500, 500] -> (800, 800) | 7222.2 | 7268.6
6 threads: ------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 144.9 | 140.1
[1, 3, 320, 320] -> (512, 512) | 560.7 | 540.6
[32, 128, 64, 64] -> (32, 32) | 1418.1 | 1418.6
[32, 128, 64, 64] -> (128, 128) | 28158.4 | 26411.4
[2, 128, 64, 46] -> (32, 32) | 18.4 | 17.8
[2, 128, 64, 46] -> (128, 128) | 532.3 | 552.0
[1, 128, 64, 46] -> (32, 32) | 13.9 | 13.6
[1, 128, 64, 46] -> (128, 128) | 81.3 | 82.9
[1, 3, 500, 500] -> (256, 256) | 145.9 | 141.6
[1, 3, 500, 500] -> (800, 800) | 1363.4 | 1316.2
Times are in microseconds (us).
[---------- upsample_nearest2d channels_last non-contiguous torch.uint8 ----------]
| 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8
1 threads: ------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 795.7 | 824.1
[1, 3, 320, 320] -> (512, 512) | 3163.4 | 3274.8
[32, 128, 64, 64] -> (32, 32) | 798.8 | 812.2
[32, 128, 64, 64] -> (128, 128) | 25259.6 | 25453.1
[2, 128, 64, 46] -> (32, 32) | 39.3 | 39.9
[2, 128, 64, 46] -> (128, 128) | 493.7 | 499.9
[1, 128, 64, 46] -> (32, 32) | 22.6 | 22.9
[1, 128, 64, 46] -> (128, 128) | 249.7 | 254.0
[32, 64, 128, 64] -> (32, 32) | 475.3 | 507.4
[32, 64, 128, 64] -> (128, 128) | 13709.7 | 13767.5
[1, 3, 500, 500] -> (256, 256) | 804.0 | 827.6
[1, 3, 500, 500] -> (800, 800) | 7764.9 | 7982.7
6 threads: ------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 150.1 | 151.4
[1, 3, 320, 320] -> (512, 512) | 589.5 | 592.6
[32, 128, 64, 64] -> (32, 32) | 141.3 | 194.5
[32, 128, 64, 64] -> (128, 128) | 6916.5 | 7445.0
[2, 128, 64, 46] -> (32, 32) | 10.0 | 12.5
[2, 128, 64, 46] -> (128, 128) | 95.8 | 141.1
[1, 128, 64, 46] -> (32, 32) | 8.1 | 10.0
[1, 128, 64, 46] -> (128, 128) | 52.5 | 74.3
[32, 64, 128, 64] -> (32, 32) | 79.8 | 123.7
[32, 64, 128, 64] -> (128, 128) | 3639.9 | 4087.9
[1, 3, 500, 500] -> (256, 256) | 150.7 | 152.2
[1, 3, 500, 500] -> (800, 800) | 1430.9 | 1440.7
Times are in microseconds (us).
[------ upsample_nearest1d channels_first contiguous torch.float32 ------]
| 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8
1 threads: ---------------------------------------------------------------
[4, 512, 320] -> [256] | 1601.7 | 241.7
[4, 512, 320] -> [512] | 3188.5 | 435.7
6 threads: ---------------------------------------------------------------
[4, 512, 320] -> [256] | 291.9 | 53.3
[4, 512, 320] -> [512] | 577.8 | 88.1
Times are in microseconds (us).
[------- upsample_nearest1d channels_first contiguous torch.uint8 -------]
| 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8
1 threads: ---------------------------------------------------------------
[4, 512, 320] -> [256] | 2010.1 | 532.3
[4, 512, 320] -> [512] | 3999.7 | 1011.4
6 threads: ---------------------------------------------------------------
[4, 512, 320] -> [256] | 364.2 | 104.6
[4, 512, 320] -> [512] | 722.8 | 193.5
Times are in microseconds (us).
[-------------- upsample_nearest3d channels_first contiguous torch.float32 --------------]
| 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8
1 threads: -------------------------------------------------------------------------------
[1, 3, 16, 320, 320] -> [8, 256, 256] | 14801.0 | 977.5
[1, 3, 16, 320, 320] -> [32, 512, 512] | 217368.5 | 41577.3
6 threads: -------------------------------------------------------------------------------
[1, 3, 16, 320, 320] -> [8, 256, 256] | 2670.3 | 210.7
[1, 3, 16, 320, 320] -> [32, 512, 512] | 42023.6 | 10971.6
Times are in microseconds (us).
[--------------- upsample_nearest3d channels_first contiguous torch.uint8 ---------------]
| 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8
1 threads: -------------------------------------------------------------------------------
[1, 3, 16, 320, 320] -> [8, 256, 256] | 17151.7 | 3195.8
[1, 3, 16, 320, 320] -> [32, 512, 512] | 221221.0 | 50524.5
6 threads: -------------------------------------------------------------------------------
[1, 3, 16, 320, 320] -> [8, 256, 256] | 3085.3 | 588.6
[1, 3, 16, 320, 320] -> [32, 512, 512] | 39842.0 | 9141.0
Times are in microseconds (us).
[------------ upsample_nearest3d channels_last non-contiguous torch.float32 -------------]
| 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8
1 threads: -------------------------------------------------------------------------------
[1, 3, 16, 320, 320] -> [8, 256, 256] | 7694.1 | 7729.0
[1, 3, 16, 320, 320] -> [32, 512, 512] | 138104.6 | 138158.0
[1, 16, 32, 64, 64] -> [16, 32, 32] | 251.1 | 252.4
[1, 16, 32, 64, 64] -> [64, 128, 128] | 28991.5 | 28882.8
6 threads: -------------------------------------------------------------------------------
[1, 3, 16, 320, 320] -> [8, 256, 256] | 1398.3 | 1402.6
[1, 3, 16, 320, 320] -> [32, 512, 512] | 28056.5 | 28123.2
[1, 16, 32, 64, 64] -> [16, 32, 32] | 50.8 | 51.1
[1, 16, 32, 64, 64] -> [64, 128, 128] | 7595.7 | 7540.7
Times are in microseconds (us).
[------------- upsample_nearest3d channels_last non-contiguous torch.uint8 --------------]
| 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8
1 threads: -------------------------------------------------------------------------------
[1, 3, 16, 320, 320] -> [8, 256, 256] | 8147.8 | 8176.2
[1, 3, 16, 320, 320] -> [32, 512, 512] | 114658.1 | 114992.7
[1, 16, 32, 64, 64] -> [16, 32, 32] | 364.3 | 356.0
[1, 16, 32, 64, 64] -> [64, 128, 128] | 17276.0 | 16331.0
6 threads: -------------------------------------------------------------------------------
[1, 3, 16, 320, 320] -> [8, 256, 256] | 1469.4 | 1476.1
[1, 3, 16, 320, 320] -> [32, 512, 512] | 20647.1 | 20722.6
[1, 16, 32, 64, 64] -> [16, 32, 32] | 69.7 | 68.4
[1, 16, 32, 64, 64] -> [64, 128, 128] | 3125.7 | 2948.2
Times are in microseconds (us).
[----------- upsample_bicubic2d channels_first contiguous torch.float32 ----------]
| 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8
1 threads: ------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 5961.0 | 1680.2
[1, 3, 320, 320] -> (512, 512) | 23803.7 | 6591.0
[32, 128, 64, 64] -> (32, 32) | 620609.4 | 37981.6
[32, 128, 64, 64] -> (128, 128) | 10120286.1 | 646305.5
[1, 3, 500, 500] -> (256, 256) | 6005.4 | 1694.6
[1, 3, 500, 500] -> (800, 800) | 58271.9 | 16047.6
6 threads: ------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 6218.5 | 347.1
[1, 3, 320, 320] -> (512, 512) | 24144.6 | 1253.4
[32, 128, 64, 64] -> (32, 32) | 612762.5 | 6934.8
[32, 128, 64, 64] -> (128, 128) | 9906221.2 | 127411.1
[1, 3, 500, 500] -> (256, 256) | 6241.9 | 350.2
[1, 3, 500, 500] -> (800, 800) | 59052.2 | 2984.8
Times are in microseconds (us).
[-------- upsample_bicubic2d channels_first non-contiguous torch.float32 --------]
| 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8
1 threads: -----------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 6050.9 | 1694.3
[1, 3, 320, 320] -> (512, 512) | 23897.1 | 6607.9
[1, 3, 500, 500] -> (256, 256) | 6282.8 | 1693.9
[1, 3, 500, 500] -> (800, 800) | 58608.1 | 16061.0
6 threads: -----------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 6243.7 | 347.6
[1, 3, 320, 320] -> (512, 512) | 24779.9 | 1253.8
[1, 3, 500, 500] -> (256, 256) | 6348.0 | 350.7
[1, 3, 500, 500] -> (800, 800) | 59255.6 | 2983.8
Times are in microseconds (us).
[--------- upsample_bicubic2d channels_last non-contiguous torch.float32 ---------]
| 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8
1 threads: ------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 6117.0 | 1688.2
[1, 3, 320, 320] -> (512, 512) | 23967.4 | 6644.8
[32, 128, 64, 64] -> (32, 32) | 679574.0 | 78477.4
[32, 128, 64, 64] -> (128, 128) | 10334325.5 | 817649.0
[2, 128, 64, 46] -> (32, 32) | 9828.0 | 4449.2
[2, 128, 64, 46] -> (128, 128) | 134989.3 | 42817.4
[1, 128, 64, 46] -> (32, 32) | 4508.2 | 2228.6
[1, 128, 64, 46] -> (128, 128) | 59404.9 | 21400.4
[1, 3, 500, 500] -> (256, 256) | 6359.0 | 1712.7
[1, 3, 500, 500] -> (800, 800) | 58717.6 | 16086.6
6 threads: ------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 6922.0 | 349.5
[1, 3, 320, 320] -> (512, 512) | 24916.5 | 1260.2
[32, 128, 64, 64] -> (32, 32) | 454240.4 | 16491.4
[32, 128, 64, 64] -> (128, 128) | 7198101.5 | 159921.9
[2, 128, 64, 46] -> (32, 32) | 10082.8 | 891.1
[2, 128, 64, 46] -> (128, 128) | 151037.0 | 7704.2
[1, 128, 64, 46] -> (32, 32) | 4325.5 | 633.9
[1, 128, 64, 46] -> (128, 128) | 62400.4 | 3853.5
[1, 3, 500, 500] -> (256, 256) | 6374.9 | 354.9
[1, 3, 500, 500] -> (800, 800) | 58638.8 | 2992.0
Times are in microseconds (us).
Intermediate benchmark sources:
- results/20210331-092940_pth_nightly_results_1.9.0a0+git8518b0e.log.save
- results/20210331-092940_pr_results_1.9.0a0+git73137d8.log.save
```
[Source file](https://raw.githubusercontent.com/vfdev-5/interpolate-tensoriterator/master/step_seven/results/20210326-061238_pr_1.9.0a0%2Bgita17040a_vs_pth_1.9.0a0%2Bgit8518b0e_results.md)
</details>
This description is based on the benchmarks and the code from [here](https://github.com/vfdev-5/interpolate-tensoriterator/tree/master/step_seven).
Joint work with Francisco Massa (fmassa).
---
Appendix: Results without original 2d/3d channels last implementation
<details>
<summary>
Quick benchmark results between 8518b0e (master) and [this branch](https://github.com/pytorch/pytorch/compare/master...Quansight:vfdev-5/generic-upsample-tensor-iterator)
</summary>
```
Description:
- 20212303-061238_pth_nightly_results_1.9.0a0+git8518b0e.opencv.6
- 20212303-061238_pth_nightly_results_1.9.0a0+git8518b0e.opencv.1
- 20212303-061238_pr_results_1.9.0a0+gite3a9544.opencv.6
- 20212303-061238_pr_results_1.9.0a0+gite3a9544.opencv.1
[----------------- upsample_bilinear2d channels_first contiguous -----------------]
| 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544
1 threads: ------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 348.5 | 331.7
[1, 3, 320, 320] -> (512, 512) | 1254.0 | 1178.1
[32, 128, 64, 64] -> (32, 32) | 10409.4 | 10009.1
[32, 128, 64, 64] -> (128, 128) | 210175.8 | 204542.5
[1, 3, 500, 500] -> (256, 256) | 348.5 | 329.5
[1, 3, 500, 500] -> (800, 800) | 3079.8 | 2890.1
6 threads: ------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 76.4 | 73.4
[1, 3, 320, 320] -> (512, 512) | 247.1 | 232.0
[32, 128, 64, 64] -> (32, 32) | 2371.1 | 2340.5
[32, 128, 64, 64] -> (128, 128) | 62182.6 | 54089.9
[1, 3, 500, 500] -> (256, 256) | 78.2 | 75.8
[1, 3, 500, 500] -> (800, 800) | 569.0 | 541.3
Times are in microseconds (us).
[-------------- upsample_bilinear2d channels_first non-contiguous ---------------]
| 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544
1 threads: -----------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 340.5 | 321.9
[1, 3, 320, 320] -> (512, 512) | 1256.1 | 1179.0
[1, 3, 500, 500] -> (256, 256) | 351.4 | 332.0
[1, 3, 500, 500] -> (800, 800) | 3089.1 | 2898.6
6 threads: -----------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 77.2 | 75.0
[1, 3, 320, 320] -> (512, 512) | 246.6 | 232.7
[1, 3, 500, 500] -> (256, 256) | 78.6 | 75.4
[1, 3, 500, 500] -> (800, 800) | 576.3 | 539.6
Times are in microseconds (us).
[------------------------ upsample_bilinear2d channels_last non-contiguous ------------------------]
| 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544 | opencv 4.5.1
1 threads: -----------------------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 971.9 | 1324.6 | 99.6
[1, 3, 320, 320] -> (512, 512) | 3867.8 | 5329.9 | 271.5
[32, 128, 64, 64] -> (32, 32) | 6010.6 | 6304.3 |
[32, 128, 64, 64] -> (128, 128) | 112299.9 | 116956.8 |
[2, 128, 64, 46] -> (32, 32) | 110.1 | 133.2 |
[2, 128, 64, 46] -> (128, 128) | 1690.1 | 1838.6 |
[1, 128, 64, 46] -> (32, 32) | 55.8 | 73.4 | 185.8
[1, 128, 64, 46] -> (128, 128) | 474.5 | 684.9 | 1445.7
[1, 3, 500, 500] -> (256, 256) | 972.9 | 1343.0 | 149.5
[1, 3, 500, 500] -> (800, 800) | 9460.2 | 12925.8 | 685.1
6 threads: -----------------------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 956.6 | 260.1 | 27.1
[1, 3, 320, 320] -> (512, 512) | 3867.3 | 967.1 | 63.6
[32, 128, 64, 64] -> (32, 32) | 2489.4 | 2427.0 |
[32, 128, 64, 64] -> (128, 128) | 37462.1 | 41329.8 |
[2, 128, 64, 46] -> (32, 32) | 61.2 | 38.9 |
[2, 128, 64, 46] -> (128, 128) | 904.2 | 652.0 |
[1, 128, 64, 46] -> (32, 32) | 57.1 | 32.0 | 191.1
[1, 128, 64, 46] -> (128, 128) | 491.4 | 138.1 | 1485.8
[1, 3, 500, 500] -> (256, 256) | 977.0 | 257.8 | 36.6
[1, 3, 500, 500] -> (800, 800) | 9470.0 | 2696.0 | 142.8
Times are in microseconds (us).
[------------- upsample_linear1d channels_first contiguous --------------]
| 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544
1 threads: ---------------------------------------------------------------
[4, 512, 320] -> [256] | 516.5 | 524.7
[4, 512, 320] -> [512] | 993.8 | 1008.0
6 threads: ---------------------------------------------------------------
[4, 512, 320] -> [256] | 104.3 | 105.4
[4, 512, 320] -> [512] | 193.5 | 195.6
Times are in microseconds (us).
[-------------------- upsample_trilinear3d channels_first contiguous --------------------]
| 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544
1 threads: -------------------------------------------------------------------------------
[1, 3, 16, 320, 320] -> [8, 256, 256] | 5.5 | 11.5
[1, 3, 16, 320, 320] -> [32, 512, 512] | 116.3 | 213.1
6 threads: -------------------------------------------------------------------------------
[1, 3, 16, 320, 320] -> [8, 256, 256] | 1.1 | 2.1
[1, 3, 16, 320, 320] -> [32, 512, 512] | 36.1 | 47.2
Times are in milliseconds (ms).
[------------------ upsample_trilinear3d channels_last non-contiguous -------------------]
| 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544
1 threads: -------------------------------------------------------------------------------
[1, 3, 16, 320, 320] -> [8, 256, 256] | 13.1 | 19.9
[1, 3, 16, 320, 320] -> [32, 512, 512] | 242.3 | 349.4
6 threads: -------------------------------------------------------------------------------
[1, 3, 16, 320, 320] -> [8, 256, 256] | 13.1 | 4.4
[1, 3, 16, 320, 320] -> [32, 512, 512] | 242.4 | 87.2
Times are in milliseconds (ms).
[------------------ upsample_nearest2d channels_first contiguous -----------------]
| 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544
1 threads: ------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 1194.5 | 107.8
[1, 3, 320, 320] -> (512, 512) | 4813.8 | 365.5
[32, 128, 64, 64] -> (32, 32) | 26745.6 | 6280.6
[32, 128, 64, 64] -> (128, 128) | 357686.7 | 129032.9
[1, 3, 500, 500] -> (256, 256) | 1205.9 | 123.8
[1, 3, 500, 500] -> (800, 800) | 11770.3 | 879.2
6 threads: ------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 220.2 | 32.7
[1, 3, 320, 320] -> (512, 512) | 867.2 | 78.7
[32, 128, 64, 64] -> (32, 32) | 5789.6 | 2241.8
[32, 128, 64, 64] -> (128, 128) | 89125.3 | 41881.3
[1, 3, 500, 500] -> (256, 256) | 224.3 | 34.8
[1, 3, 500, 500] -> (800, 800) | 2182.8 | 176.6
Times are in microseconds (us).
[--------------- upsample_nearest2d channels_first non-contiguous ---------------]
| 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544
1 threads: -----------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 1279.5 | 110.2
[1, 3, 320, 320] -> (512, 512) | 4908.1 | 367.1
[1, 3, 500, 500] -> (256, 256) | 1488.1 | 123.4
[1, 3, 500, 500] -> (800, 800) | 12186.4 | 879.3
6 threads: -----------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 241.8 | 32.6
[1, 3, 320, 320] -> (512, 512) | 889.0 | 79.2
[1, 3, 500, 500] -> (256, 256) | 279.2 | 35.6
[1, 3, 500, 500] -> (800, 800) | 2226.5 | 174.3
Times are in microseconds (us).
[------------------------ upsample_nearest2d channels_last non-contiguous -------------------------]
| 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544 | opencv 4.5.1
1 threads: -----------------------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 752.1 | 487.2 | 75.5
[1, 3, 320, 320] -> (512, 512) | 2992.6 | 1880.0 | 251.4
[32, 128, 64, 64] -> (32, 32) | 3458.6 | 3466.5 |
[32, 128, 64, 64] -> (128, 128) | 102350.7 | 103919.4 |
[2, 128, 64, 46] -> (32, 32) | 75.2 | 85.2 |
[2, 128, 64, 46] -> (128, 128) | 1637.0 | 1690.4 |
[1, 128, 64, 46] -> (32, 32) | 39.6 | 47.2 | 37.6
[1, 128, 64, 46] -> (128, 128) | 426.3 | 449.0 | 412.4
[1, 3, 500, 500] -> (256, 256) | 757.5 | 495.5 | 85.0
[1, 3, 500, 500] -> (800, 800) | 7281.4 | 4532.6 | 622.8
6 threads: -----------------------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 139.3 | 104.1 | 75.7
[1, 3, 320, 320] -> (512, 512) | 535.5 | 361.2 | 73.0
[32, 128, 64, 64] -> (32, 32) | 1518.6 | 1458.2 |
[32, 128, 64, 64] -> (128, 128) | 37117.7 | 40142.4 |
[2, 128, 64, 46] -> (32, 32) | 17.6 | 26.6 |
[2, 128, 64, 46] -> (128, 128) | 537.6 | 629.4 |
[1, 128, 64, 46] -> (32, 32) | 13.7 | 22.1 | 38.8
[1, 128, 64, 46] -> (128, 128) | 83.6 | 94.5 | 420.2
[1, 3, 500, 500] -> (256, 256) | 140.8 | 104.9 | 87.8
[1, 3, 500, 500] -> (800, 800) | 1317.8 | 853.8 | 139.7
Times are in microseconds (us).
[------------- upsample_nearest1d channels_first contiguous -------------]
| 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544
1 threads: ---------------------------------------------------------------
[4, 512, 320] -> [256] | 1594.3 | 247.4
[4, 512, 320] -> [512] | 3222.6 | 440.4
6 threads: ---------------------------------------------------------------
[4, 512, 320] -> [256] | 294.4 | 53.7
[4, 512, 320] -> [512] | 575.0 | 88.5
Times are in microseconds (us).
[--------------------- upsample_nearest3d channels_first contiguous ---------------------]
| 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544
1 threads: -------------------------------------------------------------------------------
[1, 3, 16, 320, 320] -> [8, 256, 256] | 14952.7 | 1005.7
[1, 3, 16, 320, 320] -> [32, 512, 512] | 224955.6 | 46228.0
6 threads: -------------------------------------------------------------------------------
[1, 3, 16, 320, 320] -> [8, 256, 256] | 2887.2 | 206.2
[1, 3, 16, 320, 320] -> [32, 512, 512] | 56872.0 | 13566.3
Times are in microseconds (us).
[------------------- upsample_nearest3d channels_last non-contiguous --------------------]
| 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544
1 threads: -------------------------------------------------------------------------------
[1, 3, 16, 320, 320] -> [8, 256, 256] | 7772.3 | 4770.9
[1, 3, 16, 320, 320] -> [32, 512, 512] | 144655.1 | 108605.0
6 threads: -------------------------------------------------------------------------------
[1, 3, 16, 320, 320] -> [8, 256, 256] | 1401.9 | 877.7
[1, 3, 16, 320, 320] -> [32, 512, 512] | 35939.6 | 28621.5
Times are in microseconds (us).
[------------------ upsample_bicubic2d channels_first contiguous -----------------]
| 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544
1 threads: ------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 6038.7 | 2340.4
[1, 3, 320, 320] -> (512, 512) | 24040.6 | 9205.9
[32, 128, 64, 64] -> (32, 32) | 471016.3 | 52059.1
[32, 128, 64, 64] -> (128, 128) | 7705594.5 | 884743.9
[1, 3, 500, 500] -> (256, 256) | 6061.5 | 2361.9
[1, 3, 500, 500] -> (800, 800) | 58940.7 | 22401.8
6 threads: ------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 6594.3 | 466.5
[1, 3, 320, 320] -> (512, 512) | 25361.5 | 1729.1
[32, 128, 64, 64] -> (32, 32) | 487783.5 | 11550.0
[32, 128, 64, 64] -> (128, 128) | 7963636.6 | 196017.3
[1, 3, 500, 500] -> (256, 256) | 6443.8 | 464.1
[1, 3, 500, 500] -> (800, 800) | 61891.9 | 4257.2
Times are in microseconds (us).
[--------------- upsample_bicubic2d channels_first non-contiguous ---------------]
| 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544
1 threads: -----------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 6116.7 | 2357.0
[1, 3, 320, 320] -> (512, 512) | 24182.0 | 9213.9
[1, 3, 500, 500] -> (256, 256) | 6349.6 | 2358.5
[1, 3, 500, 500] -> (800, 800) | 59365.2 | 22431.2
6 threads: -----------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 7155.1 | 464.6
[1, 3, 320, 320] -> (512, 512) | 24566.8 | 1712.4
[1, 3, 500, 500] -> (256, 256) | 7217.5 | 466.6
[1, 3, 500, 500] -> (800, 800) | 59880.2 | 4148.8
Times are in microseconds (us).
[------------------------ upsample_bicubic2d channels_last non-contiguous -------------------------]
| 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544 | opencv 4.5.1
1 threads: -----------------------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 6184.3 | 2360.0 | 215.0
[1, 3, 320, 320] -> (512, 512) | 24499.7 | 9231.1 | 510.7
[32, 128, 64, 64] -> (32, 32) | 548304.5 | 93517.8 |
[32, 128, 64, 64] -> (128, 128) | 7810958.3 | 1086334.6 |
[2, 128, 64, 46] -> (32, 32) | 10883.4 | 5594.9 |
[2, 128, 64, 46] -> (128, 128) | 153253.2 | 57071.2 |
[1, 128, 64, 46] -> (32, 32) | 4519.4 | 2826.5 | 619.7
[1, 128, 64, 46] -> (128, 128) | 61339.7 | 28470.7 | 3654.5
[1, 3, 500, 500] -> (256, 256) | 6444.8 | 2389.9 | 292.9
[1, 3, 500, 500] -> (800, 800) | 59448.0 | 22479.1 | 1316.9
6 threads: -----------------------------------------------------------------------------------------
[1, 3, 320, 320] -> (256, 256) | 6370.1 | 464.9 | 61.3
[1, 3, 320, 320] -> (512, 512) | 25365.6 | 1767.5 | 145.7
[32, 128, 64, 64] -> (32, 32) | 502888.7 | 22016.3 |
[32, 128, 64, 64] -> (128, 128) | 8072918.9 | 234567.0 |
[2, 128, 64, 46] -> (32, 32) | 11171.4 | 1049.5 |
[2, 128, 64, 46] -> (128, 128) | 152612.5 | 11264.8 |
[1, 128, 64, 46] -> (32, 32) | 4359.3 | 791.4 | 651.1
[1, 128, 64, 46] -> (128, 128) | 61346.5 | 7563.9 | 3765.2
[1, 3, 500, 500] -> (256, 256) | 6644.4 | 469.7 | 77.4
[1, 3, 500, 500] -> (800, 800) | 59947.2 | 4154.3 | 313.2
Times are in microseconds (us).
Intermediate benchmark sources:
- results/20212303-061238_pth_nightly_results_1.9.0a0+git8518b0e.log.save.opencv
- results/20212303-061238_pr_results_1.9.0a0+gite3a9544.log.save.opencv
```
[Source file](https://raw.githubusercontent.com/vfdev-5/interpolate-tensoriterator/master/step_seven/results/20212303-061238_pr_1.9.0a0%2Bgite3a9544_vs_pth_1.9.0a0%2Bgit8518b0e_results.opencv.md)
</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54500
Reviewed By: glaringlee
Differential Revision: D27463566
Pulled By: fmassa
fbshipit-source-id: ceac3a8cee0eeb1a4ddd9344accffcc65449a49a