pytorch
e5f66f00 - Optimized generic interpolation using TensorIterator (keeps original 2d/3d channels last impl) (#54500)

Commit
3 years ago
Optimized generic interpolation using TensorIterator (keeps original 2d/3d channels last impl) (#54500) Summary: Related to https://github.com/pytorch/pytorch/issues/10482 A follow-up PR to https://github.com/pytorch/pytorch/pull/51653/ Description: - Replaces nearest/linear/cubic implementations with generic interpolation implementation - Retains 2d/3d channels last implementation due to perf slowdown for 1 thread (see below appendix note) Speed-ups for cases: - upsample_nearest channels first - upsample_bicubic channels first/last ### Results for this PR <details> <summary> Benchmark results between 8518b0e (master) and 73137d8 (this PR) </summary> ``` Description: - 20210331-092940_pth_nightly_results_1.9.0a0+git8518b0e.6 - 20210331-092940_pth_nightly_results_1.9.0a0+git8518b0e.1 - 20210331-092940_pr_results_1.9.0a0+git73137d8.6 - 20210331-092940_pr_results_1.9.0a0+git73137d8.1 [---------- upsample_bilinear2d channels_first contiguous torch.float32 ----------] | 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) | 331.8 | 334.6 [1, 3, 320, 320] -> (512, 512) | 1261.7 | 1271.5 [32, 128, 64, 64] -> (32, 32) | 10164.6 | 10251.4 [32, 128, 64, 64] -> (128, 128) | 195966.1 | 197141.8 [1, 3, 500, 500] -> (256, 256) | 347.7 | 348.3 [1, 3, 500, 500] -> (800, 800) | 3044.9 | 3071.4 6 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) | 76.1 | 77.0 [1, 3, 320, 320] -> (512, 512) | 244.8 | 247.6 [32, 128, 64, 64] -> (32, 32) | 2329.4 | 2315.8 [32, 128, 64, 64] -> (128, 128) | 47855.3 | 49047.7 [1, 3, 500, 500] -> (256, 256) | 78.1 | 78.7 [1, 3, 500, 500] -> (800, 800) | 569.3 | 575.6 Times are in microseconds (us). [------- upsample_bilinear2d channels_first non-contiguous torch.float32 --------] | 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8 1 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) | 339.0 | 340.3 [1, 3, 320, 320] -> (512, 512) | 1266.1 | 1277.3 [1, 3, 500, 500] -> (256, 256) | 348.8 | 351.3 [1, 3, 500, 500] -> (800, 800) | 3054.5 | 3077.3 6 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) | 76.6 | 77.4 [1, 3, 320, 320] -> (512, 512) | 246.0 | 248.1 [1, 3, 500, 500] -> (256, 256) | 78.3 | 79.5 [1, 3, 500, 500] -> (800, 800) | 572.2 | 580.0 Times are in microseconds (us). [--------- upsample_bilinear2d channels_last non-contiguous torch.float32 --------] | 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) | 965.4 | 964.9 [1, 3, 320, 320] -> (512, 512) | 3856.2 | 3866.8 [32, 128, 64, 64] -> (32, 32) | 5808.3 | 5812.8 [32, 128, 64, 64] -> (128, 128) | 99575.2 | 97226.2 [2, 128, 64, 46] -> (32, 32) | 110.5 | 109.0 [2, 128, 64, 46] -> (128, 128) | 1662.3 | 1612.0 [1, 128, 64, 46] -> (32, 32) | 55.6 | 55.5 [1, 128, 64, 46] -> (128, 128) | 467.0 | 463.9 [1, 3, 500, 500] -> (256, 256) | 967.7 | 966.7 [1, 3, 500, 500] -> (800, 800) | 9394.7 | 9436.6 6 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) | 962.2 | 965.4 [1, 3, 320, 320] -> (512, 512) | 3844.3 | 3844.3 [32, 128, 64, 64] -> (32, 32) | 2270.0 | 2267.6 [32, 128, 64, 64] -> (128, 128) | 31909.7 | 32106.5 [2, 128, 64, 46] -> (32, 32) | 61.3 | 59.9 [2, 128, 64, 46] -> (128, 128) | 912.3 | 893.5 [1, 128, 64, 46] -> (32, 32) | 55.5 | 55.3 [1, 128, 64, 46] -> (128, 128) | 467.0 | 466.4 [1, 3, 500, 500] -> (256, 256) | 967.2 | 971.1 [1, 3, 500, 500] -> (800, 800) | 9383.2 | 9417.4 Times are in microseconds (us). [------ upsample_linear1d channels_first contiguous torch.float32 -------] | 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8 1 threads: --------------------------------------------------------------- [4, 512, 320] -> [256] | 513.5 | 521.8 [4, 512, 320] -> [512] | 999.0 | 1011.8 6 threads: --------------------------------------------------------------- [4, 512, 320] -> [256] | 103.7 | 104.9 [4, 512, 320] -> [512] | 192.2 | 194.9 Times are in microseconds (us). [------------- upsample_trilinear3d channels_first contiguous torch.float32 -------------] | 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] | 5.4 | 5.5 [1, 3, 16, 320, 320] -> [32, 512, 512] | 111.2 | 111.1 6 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] | 1.1 | 1.0 [1, 3, 16, 320, 320] -> [32, 512, 512] | 23.4 | 23.2 Times are in milliseconds (ms). [----------- upsample_trilinear3d channels_last non-contiguous torch.float32 ------------] | 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] | 13521.9 | 12939.9 [1, 3, 16, 320, 320] -> [32, 512, 512] | 244561.3 | 236595.6 [1, 16, 32, 64, 64] -> [16, 32, 32] | 362.2 | 365.5 [1, 16, 32, 64, 64] -> [64, 128, 128] | 38141.4 | 37957.7 6 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] | 12980.4 | 12962.7 [1, 3, 16, 320, 320] -> [32, 512, 512] | 236256.4 | 236364.5 [1, 16, 32, 64, 64] -> [16, 32, 32] | 367.9 | 393.2 [1, 16, 32, 64, 64] -> [64, 128, 128] | 38222.5 | 38198.3 Times are in microseconds (us). [----------- upsample_nearest2d channels_first contiguous torch.float32 ----------] | 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) | 1205.7 | 107.2 [1, 3, 320, 320] -> (512, 512) | 4793.5 | 357.7 [32, 128, 64, 64] -> (32, 32) | 26550.0 | 6227.1 [32, 128, 64, 64] -> (128, 128) | 341140.3 | 116404.4 [1, 3, 500, 500] -> (256, 256) | 1208.6 | 122.9 [1, 3, 500, 500] -> (800, 800) | 11648.0 | 848.1 6 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) | 220.5 | 32.6 [1, 3, 320, 320] -> (512, 512) | 865.4 | 78.1 [32, 128, 64, 64] -> (32, 32) | 4890.9 | 2201.2 [32, 128, 64, 64] -> (128, 128) | 73533.8 | 32315.4 [1, 3, 500, 500] -> (256, 256) | 222.3 | 35.0 [1, 3, 500, 500] -> (800, 800) | 2107.5 | 170.7 Times are in microseconds (us). [----------- upsample_nearest2d channels_first contiguous torch.uint8 -----------] | 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8 1 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) | 1457.0 | 310.7 [1, 3, 320, 320] -> (512, 512) | 5808.0 | 1196.6 [1, 3, 500, 500] -> (256, 256) | 1460.9 | 312.7 [1, 3, 500, 500] -> (800, 800) | 14094.3 | 2903.5 6 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) | 264.8 | 66.8 [1, 3, 320, 320] -> (512, 512) | 1046.0 | 228.9 [1, 3, 500, 500] -> (256, 256) | 266.0 | 68.0 [1, 3, 500, 500] -> (800, 800) | 2546.6 | 535.8 Times are in microseconds (us). [-------- upsample_nearest2d channels_first non-contiguous torch.float32 --------] | 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8 1 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) | 1284.3 | 109.9 [1, 3, 320, 320] -> (512, 512) | 4870.0 | 361.6 [1, 3, 500, 500] -> (256, 256) | 1482.8 | 123.3 [1, 3, 500, 500] -> (800, 800) | 12050.3 | 858.8 6 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) | 240.2 | 32.8 [1, 3, 320, 320] -> (512, 512) | 886.1 | 78.4 [1, 3, 500, 500] -> (256, 256) | 274.9 | 34.9 [1, 3, 500, 500] -> (800, 800) | 2188.8 | 174.0 Times are in microseconds (us). [--------- upsample_nearest2d channels_first non-contiguous torch.uint8 ---------] | 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8 1 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) | 1501.9 | 312.2 [1, 3, 320, 320] -> (512, 512) | 5853.4 | 1202.1 [1, 3, 500, 500] -> (256, 256) | 1574.0 | 313.9 [1, 3, 500, 500] -> (800, 800) | 14210.2 | 2904.5 6 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) | 277.2 | 67.2 [1, 3, 320, 320] -> (512, 512) | 1059.8 | 228.9 [1, 3, 500, 500] -> (256, 256) | 292.2 | 68.1 [1, 3, 500, 500] -> (800, 800) | 2574.4 | 536.2 Times are in microseconds (us). [--------- upsample_nearest2d channels_last non-contiguous torch.float32 ---------] | 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) | 746.0 | 751.1 [1, 3, 320, 320] -> (512, 512) | 2967.6 | 2979.2 [32, 128, 64, 64] -> (32, 32) | 3408.5 | 3379.0 [32, 128, 64, 64] -> (128, 128) | 90166.4 | 90023.0 [2, 128, 64, 46] -> (32, 32) | 74.8 | 74.5 [2, 128, 64, 46] -> (128, 128) | 1591.2 | 1594.3 [1, 128, 64, 46] -> (32, 32) | 39.3 | 39.2 [1, 128, 64, 46] -> (128, 128) | 420.3 | 419.1 [1, 3, 500, 500] -> (256, 256) | 751.6 | 756.3 [1, 3, 500, 500] -> (800, 800) | 7222.2 | 7268.6 6 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) | 144.9 | 140.1 [1, 3, 320, 320] -> (512, 512) | 560.7 | 540.6 [32, 128, 64, 64] -> (32, 32) | 1418.1 | 1418.6 [32, 128, 64, 64] -> (128, 128) | 28158.4 | 26411.4 [2, 128, 64, 46] -> (32, 32) | 18.4 | 17.8 [2, 128, 64, 46] -> (128, 128) | 532.3 | 552.0 [1, 128, 64, 46] -> (32, 32) | 13.9 | 13.6 [1, 128, 64, 46] -> (128, 128) | 81.3 | 82.9 [1, 3, 500, 500] -> (256, 256) | 145.9 | 141.6 [1, 3, 500, 500] -> (800, 800) | 1363.4 | 1316.2 Times are in microseconds (us). [---------- upsample_nearest2d channels_last non-contiguous torch.uint8 ----------] | 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) | 795.7 | 824.1 [1, 3, 320, 320] -> (512, 512) | 3163.4 | 3274.8 [32, 128, 64, 64] -> (32, 32) | 798.8 | 812.2 [32, 128, 64, 64] -> (128, 128) | 25259.6 | 25453.1 [2, 128, 64, 46] -> (32, 32) | 39.3 | 39.9 [2, 128, 64, 46] -> (128, 128) | 493.7 | 499.9 [1, 128, 64, 46] -> (32, 32) | 22.6 | 22.9 [1, 128, 64, 46] -> (128, 128) | 249.7 | 254.0 [32, 64, 128, 64] -> (32, 32) | 475.3 | 507.4 [32, 64, 128, 64] -> (128, 128) | 13709.7 | 13767.5 [1, 3, 500, 500] -> (256, 256) | 804.0 | 827.6 [1, 3, 500, 500] -> (800, 800) | 7764.9 | 7982.7 6 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) | 150.1 | 151.4 [1, 3, 320, 320] -> (512, 512) | 589.5 | 592.6 [32, 128, 64, 64] -> (32, 32) | 141.3 | 194.5 [32, 128, 64, 64] -> (128, 128) | 6916.5 | 7445.0 [2, 128, 64, 46] -> (32, 32) | 10.0 | 12.5 [2, 128, 64, 46] -> (128, 128) | 95.8 | 141.1 [1, 128, 64, 46] -> (32, 32) | 8.1 | 10.0 [1, 128, 64, 46] -> (128, 128) | 52.5 | 74.3 [32, 64, 128, 64] -> (32, 32) | 79.8 | 123.7 [32, 64, 128, 64] -> (128, 128) | 3639.9 | 4087.9 [1, 3, 500, 500] -> (256, 256) | 150.7 | 152.2 [1, 3, 500, 500] -> (800, 800) | 1430.9 | 1440.7 Times are in microseconds (us). [------ upsample_nearest1d channels_first contiguous torch.float32 ------] | 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8 1 threads: --------------------------------------------------------------- [4, 512, 320] -> [256] | 1601.7 | 241.7 [4, 512, 320] -> [512] | 3188.5 | 435.7 6 threads: --------------------------------------------------------------- [4, 512, 320] -> [256] | 291.9 | 53.3 [4, 512, 320] -> [512] | 577.8 | 88.1 Times are in microseconds (us). [------- upsample_nearest1d channels_first contiguous torch.uint8 -------] | 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8 1 threads: --------------------------------------------------------------- [4, 512, 320] -> [256] | 2010.1 | 532.3 [4, 512, 320] -> [512] | 3999.7 | 1011.4 6 threads: --------------------------------------------------------------- [4, 512, 320] -> [256] | 364.2 | 104.6 [4, 512, 320] -> [512] | 722.8 | 193.5 Times are in microseconds (us). [-------------- upsample_nearest3d channels_first contiguous torch.float32 --------------] | 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] | 14801.0 | 977.5 [1, 3, 16, 320, 320] -> [32, 512, 512] | 217368.5 | 41577.3 6 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] | 2670.3 | 210.7 [1, 3, 16, 320, 320] -> [32, 512, 512] | 42023.6 | 10971.6 Times are in microseconds (us). [--------------- upsample_nearest3d channels_first contiguous torch.uint8 ---------------] | 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] | 17151.7 | 3195.8 [1, 3, 16, 320, 320] -> [32, 512, 512] | 221221.0 | 50524.5 6 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] | 3085.3 | 588.6 [1, 3, 16, 320, 320] -> [32, 512, 512] | 39842.0 | 9141.0 Times are in microseconds (us). [------------ upsample_nearest3d channels_last non-contiguous torch.float32 -------------] | 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] | 7694.1 | 7729.0 [1, 3, 16, 320, 320] -> [32, 512, 512] | 138104.6 | 138158.0 [1, 16, 32, 64, 64] -> [16, 32, 32] | 251.1 | 252.4 [1, 16, 32, 64, 64] -> [64, 128, 128] | 28991.5 | 28882.8 6 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] | 1398.3 | 1402.6 [1, 3, 16, 320, 320] -> [32, 512, 512] | 28056.5 | 28123.2 [1, 16, 32, 64, 64] -> [16, 32, 32] | 50.8 | 51.1 [1, 16, 32, 64, 64] -> [64, 128, 128] | 7595.7 | 7540.7 Times are in microseconds (us). [------------- upsample_nearest3d channels_last non-contiguous torch.uint8 --------------] | 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] | 8147.8 | 8176.2 [1, 3, 16, 320, 320] -> [32, 512, 512] | 114658.1 | 114992.7 [1, 16, 32, 64, 64] -> [16, 32, 32] | 364.3 | 356.0 [1, 16, 32, 64, 64] -> [64, 128, 128] | 17276.0 | 16331.0 6 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] | 1469.4 | 1476.1 [1, 3, 16, 320, 320] -> [32, 512, 512] | 20647.1 | 20722.6 [1, 16, 32, 64, 64] -> [16, 32, 32] | 69.7 | 68.4 [1, 16, 32, 64, 64] -> [64, 128, 128] | 3125.7 | 2948.2 Times are in microseconds (us). [----------- upsample_bicubic2d channels_first contiguous torch.float32 ----------] | 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) | 5961.0 | 1680.2 [1, 3, 320, 320] -> (512, 512) | 23803.7 | 6591.0 [32, 128, 64, 64] -> (32, 32) | 620609.4 | 37981.6 [32, 128, 64, 64] -> (128, 128) | 10120286.1 | 646305.5 [1, 3, 500, 500] -> (256, 256) | 6005.4 | 1694.6 [1, 3, 500, 500] -> (800, 800) | 58271.9 | 16047.6 6 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) | 6218.5 | 347.1 [1, 3, 320, 320] -> (512, 512) | 24144.6 | 1253.4 [32, 128, 64, 64] -> (32, 32) | 612762.5 | 6934.8 [32, 128, 64, 64] -> (128, 128) | 9906221.2 | 127411.1 [1, 3, 500, 500] -> (256, 256) | 6241.9 | 350.2 [1, 3, 500, 500] -> (800, 800) | 59052.2 | 2984.8 Times are in microseconds (us). [-------- upsample_bicubic2d channels_first non-contiguous torch.float32 --------] | 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8 1 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) | 6050.9 | 1694.3 [1, 3, 320, 320] -> (512, 512) | 23897.1 | 6607.9 [1, 3, 500, 500] -> (256, 256) | 6282.8 | 1693.9 [1, 3, 500, 500] -> (800, 800) | 58608.1 | 16061.0 6 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) | 6243.7 | 347.6 [1, 3, 320, 320] -> (512, 512) | 24779.9 | 1253.8 [1, 3, 500, 500] -> (256, 256) | 6348.0 | 350.7 [1, 3, 500, 500] -> (800, 800) | 59255.6 | 2983.8 Times are in microseconds (us). [--------- upsample_bicubic2d channels_last non-contiguous torch.float32 ---------] | 1.9.0a0+git8518b0e | 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) | 6117.0 | 1688.2 [1, 3, 320, 320] -> (512, 512) | 23967.4 | 6644.8 [32, 128, 64, 64] -> (32, 32) | 679574.0 | 78477.4 [32, 128, 64, 64] -> (128, 128) | 10334325.5 | 817649.0 [2, 128, 64, 46] -> (32, 32) | 9828.0 | 4449.2 [2, 128, 64, 46] -> (128, 128) | 134989.3 | 42817.4 [1, 128, 64, 46] -> (32, 32) | 4508.2 | 2228.6 [1, 128, 64, 46] -> (128, 128) | 59404.9 | 21400.4 [1, 3, 500, 500] -> (256, 256) | 6359.0 | 1712.7 [1, 3, 500, 500] -> (800, 800) | 58717.6 | 16086.6 6 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) | 6922.0 | 349.5 [1, 3, 320, 320] -> (512, 512) | 24916.5 | 1260.2 [32, 128, 64, 64] -> (32, 32) | 454240.4 | 16491.4 [32, 128, 64, 64] -> (128, 128) | 7198101.5 | 159921.9 [2, 128, 64, 46] -> (32, 32) | 10082.8 | 891.1 [2, 128, 64, 46] -> (128, 128) | 151037.0 | 7704.2 [1, 128, 64, 46] -> (32, 32) | 4325.5 | 633.9 [1, 128, 64, 46] -> (128, 128) | 62400.4 | 3853.5 [1, 3, 500, 500] -> (256, 256) | 6374.9 | 354.9 [1, 3, 500, 500] -> (800, 800) | 58638.8 | 2992.0 Times are in microseconds (us). Intermediate benchmark sources: - results/20210331-092940_pth_nightly_results_1.9.0a0+git8518b0e.log.save - results/20210331-092940_pr_results_1.9.0a0+git73137d8.log.save ``` [Source file](https://raw.githubusercontent.com/vfdev-5/interpolate-tensoriterator/master/step_seven/results/20210326-061238_pr_1.9.0a0%2Bgita17040a_vs_pth_1.9.0a0%2Bgit8518b0e_results.md) </details> This description is based on the benchmarks and the code from [here](https://github.com/vfdev-5/interpolate-tensoriterator/tree/master/step_seven). Joint work with Francisco Massa (fmassa). --- Appendix: Results without original 2d/3d channels last implementation <details> <summary> Quick benchmark results between 8518b0e (master) and [this branch](https://github.com/pytorch/pytorch/compare/master...Quansight:vfdev-5/generic-upsample-tensor-iterator) </summary> ``` Description: - 20212303-061238_pth_nightly_results_1.9.0a0+git8518b0e.opencv.6 - 20212303-061238_pth_nightly_results_1.9.0a0+git8518b0e.opencv.1 - 20212303-061238_pr_results_1.9.0a0+gite3a9544.opencv.6 - 20212303-061238_pr_results_1.9.0a0+gite3a9544.opencv.1 [----------------- upsample_bilinear2d channels_first contiguous -----------------] | 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544 1 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) | 348.5 | 331.7 [1, 3, 320, 320] -> (512, 512) | 1254.0 | 1178.1 [32, 128, 64, 64] -> (32, 32) | 10409.4 | 10009.1 [32, 128, 64, 64] -> (128, 128) | 210175.8 | 204542.5 [1, 3, 500, 500] -> (256, 256) | 348.5 | 329.5 [1, 3, 500, 500] -> (800, 800) | 3079.8 | 2890.1 6 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) | 76.4 | 73.4 [1, 3, 320, 320] -> (512, 512) | 247.1 | 232.0 [32, 128, 64, 64] -> (32, 32) | 2371.1 | 2340.5 [32, 128, 64, 64] -> (128, 128) | 62182.6 | 54089.9 [1, 3, 500, 500] -> (256, 256) | 78.2 | 75.8 [1, 3, 500, 500] -> (800, 800) | 569.0 | 541.3 Times are in microseconds (us). [-------------- upsample_bilinear2d channels_first non-contiguous ---------------] | 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544 1 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) | 340.5 | 321.9 [1, 3, 320, 320] -> (512, 512) | 1256.1 | 1179.0 [1, 3, 500, 500] -> (256, 256) | 351.4 | 332.0 [1, 3, 500, 500] -> (800, 800) | 3089.1 | 2898.6 6 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) | 77.2 | 75.0 [1, 3, 320, 320] -> (512, 512) | 246.6 | 232.7 [1, 3, 500, 500] -> (256, 256) | 78.6 | 75.4 [1, 3, 500, 500] -> (800, 800) | 576.3 | 539.6 Times are in microseconds (us). [------------------------ upsample_bilinear2d channels_last non-contiguous ------------------------] | 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544 | opencv 4.5.1 1 threads: ----------------------------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) | 971.9 | 1324.6 | 99.6 [1, 3, 320, 320] -> (512, 512) | 3867.8 | 5329.9 | 271.5 [32, 128, 64, 64] -> (32, 32) | 6010.6 | 6304.3 | [32, 128, 64, 64] -> (128, 128) | 112299.9 | 116956.8 | [2, 128, 64, 46] -> (32, 32) | 110.1 | 133.2 | [2, 128, 64, 46] -> (128, 128) | 1690.1 | 1838.6 | [1, 128, 64, 46] -> (32, 32) | 55.8 | 73.4 | 185.8 [1, 128, 64, 46] -> (128, 128) | 474.5 | 684.9 | 1445.7 [1, 3, 500, 500] -> (256, 256) | 972.9 | 1343.0 | 149.5 [1, 3, 500, 500] -> (800, 800) | 9460.2 | 12925.8 | 685.1 6 threads: ----------------------------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) | 956.6 | 260.1 | 27.1 [1, 3, 320, 320] -> (512, 512) | 3867.3 | 967.1 | 63.6 [32, 128, 64, 64] -> (32, 32) | 2489.4 | 2427.0 | [32, 128, 64, 64] -> (128, 128) | 37462.1 | 41329.8 | [2, 128, 64, 46] -> (32, 32) | 61.2 | 38.9 | [2, 128, 64, 46] -> (128, 128) | 904.2 | 652.0 | [1, 128, 64, 46] -> (32, 32) | 57.1 | 32.0 | 191.1 [1, 128, 64, 46] -> (128, 128) | 491.4 | 138.1 | 1485.8 [1, 3, 500, 500] -> (256, 256) | 977.0 | 257.8 | 36.6 [1, 3, 500, 500] -> (800, 800) | 9470.0 | 2696.0 | 142.8 Times are in microseconds (us). [------------- upsample_linear1d channels_first contiguous --------------] | 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544 1 threads: --------------------------------------------------------------- [4, 512, 320] -> [256] | 516.5 | 524.7 [4, 512, 320] -> [512] | 993.8 | 1008.0 6 threads: --------------------------------------------------------------- [4, 512, 320] -> [256] | 104.3 | 105.4 [4, 512, 320] -> [512] | 193.5 | 195.6 Times are in microseconds (us). [-------------------- upsample_trilinear3d channels_first contiguous --------------------] | 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544 1 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] | 5.5 | 11.5 [1, 3, 16, 320, 320] -> [32, 512, 512] | 116.3 | 213.1 6 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] | 1.1 | 2.1 [1, 3, 16, 320, 320] -> [32, 512, 512] | 36.1 | 47.2 Times are in milliseconds (ms). [------------------ upsample_trilinear3d channels_last non-contiguous -------------------] | 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544 1 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] | 13.1 | 19.9 [1, 3, 16, 320, 320] -> [32, 512, 512] | 242.3 | 349.4 6 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] | 13.1 | 4.4 [1, 3, 16, 320, 320] -> [32, 512, 512] | 242.4 | 87.2 Times are in milliseconds (ms). [------------------ upsample_nearest2d channels_first contiguous -----------------] | 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544 1 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) | 1194.5 | 107.8 [1, 3, 320, 320] -> (512, 512) | 4813.8 | 365.5 [32, 128, 64, 64] -> (32, 32) | 26745.6 | 6280.6 [32, 128, 64, 64] -> (128, 128) | 357686.7 | 129032.9 [1, 3, 500, 500] -> (256, 256) | 1205.9 | 123.8 [1, 3, 500, 500] -> (800, 800) | 11770.3 | 879.2 6 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) | 220.2 | 32.7 [1, 3, 320, 320] -> (512, 512) | 867.2 | 78.7 [32, 128, 64, 64] -> (32, 32) | 5789.6 | 2241.8 [32, 128, 64, 64] -> (128, 128) | 89125.3 | 41881.3 [1, 3, 500, 500] -> (256, 256) | 224.3 | 34.8 [1, 3, 500, 500] -> (800, 800) | 2182.8 | 176.6 Times are in microseconds (us). [--------------- upsample_nearest2d channels_first non-contiguous ---------------] | 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544 1 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) | 1279.5 | 110.2 [1, 3, 320, 320] -> (512, 512) | 4908.1 | 367.1 [1, 3, 500, 500] -> (256, 256) | 1488.1 | 123.4 [1, 3, 500, 500] -> (800, 800) | 12186.4 | 879.3 6 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) | 241.8 | 32.6 [1, 3, 320, 320] -> (512, 512) | 889.0 | 79.2 [1, 3, 500, 500] -> (256, 256) | 279.2 | 35.6 [1, 3, 500, 500] -> (800, 800) | 2226.5 | 174.3 Times are in microseconds (us). [------------------------ upsample_nearest2d channels_last non-contiguous -------------------------] | 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544 | opencv 4.5.1 1 threads: ----------------------------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) | 752.1 | 487.2 | 75.5 [1, 3, 320, 320] -> (512, 512) | 2992.6 | 1880.0 | 251.4 [32, 128, 64, 64] -> (32, 32) | 3458.6 | 3466.5 | [32, 128, 64, 64] -> (128, 128) | 102350.7 | 103919.4 | [2, 128, 64, 46] -> (32, 32) | 75.2 | 85.2 | [2, 128, 64, 46] -> (128, 128) | 1637.0 | 1690.4 | [1, 128, 64, 46] -> (32, 32) | 39.6 | 47.2 | 37.6 [1, 128, 64, 46] -> (128, 128) | 426.3 | 449.0 | 412.4 [1, 3, 500, 500] -> (256, 256) | 757.5 | 495.5 | 85.0 [1, 3, 500, 500] -> (800, 800) | 7281.4 | 4532.6 | 622.8 6 threads: ----------------------------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) | 139.3 | 104.1 | 75.7 [1, 3, 320, 320] -> (512, 512) | 535.5 | 361.2 | 73.0 [32, 128, 64, 64] -> (32, 32) | 1518.6 | 1458.2 | [32, 128, 64, 64] -> (128, 128) | 37117.7 | 40142.4 | [2, 128, 64, 46] -> (32, 32) | 17.6 | 26.6 | [2, 128, 64, 46] -> (128, 128) | 537.6 | 629.4 | [1, 128, 64, 46] -> (32, 32) | 13.7 | 22.1 | 38.8 [1, 128, 64, 46] -> (128, 128) | 83.6 | 94.5 | 420.2 [1, 3, 500, 500] -> (256, 256) | 140.8 | 104.9 | 87.8 [1, 3, 500, 500] -> (800, 800) | 1317.8 | 853.8 | 139.7 Times are in microseconds (us). [------------- upsample_nearest1d channels_first contiguous -------------] | 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544 1 threads: --------------------------------------------------------------- [4, 512, 320] -> [256] | 1594.3 | 247.4 [4, 512, 320] -> [512] | 3222.6 | 440.4 6 threads: --------------------------------------------------------------- [4, 512, 320] -> [256] | 294.4 | 53.7 [4, 512, 320] -> [512] | 575.0 | 88.5 Times are in microseconds (us). [--------------------- upsample_nearest3d channels_first contiguous ---------------------] | 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544 1 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] | 14952.7 | 1005.7 [1, 3, 16, 320, 320] -> [32, 512, 512] | 224955.6 | 46228.0 6 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] | 2887.2 | 206.2 [1, 3, 16, 320, 320] -> [32, 512, 512] | 56872.0 | 13566.3 Times are in microseconds (us). [------------------- upsample_nearest3d channels_last non-contiguous --------------------] | 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544 1 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] | 7772.3 | 4770.9 [1, 3, 16, 320, 320] -> [32, 512, 512] | 144655.1 | 108605.0 6 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] | 1401.9 | 877.7 [1, 3, 16, 320, 320] -> [32, 512, 512] | 35939.6 | 28621.5 Times are in microseconds (us). [------------------ upsample_bicubic2d channels_first contiguous -----------------] | 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544 1 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) | 6038.7 | 2340.4 [1, 3, 320, 320] -> (512, 512) | 24040.6 | 9205.9 [32, 128, 64, 64] -> (32, 32) | 471016.3 | 52059.1 [32, 128, 64, 64] -> (128, 128) | 7705594.5 | 884743.9 [1, 3, 500, 500] -> (256, 256) | 6061.5 | 2361.9 [1, 3, 500, 500] -> (800, 800) | 58940.7 | 22401.8 6 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) | 6594.3 | 466.5 [1, 3, 320, 320] -> (512, 512) | 25361.5 | 1729.1 [32, 128, 64, 64] -> (32, 32) | 487783.5 | 11550.0 [32, 128, 64, 64] -> (128, 128) | 7963636.6 | 196017.3 [1, 3, 500, 500] -> (256, 256) | 6443.8 | 464.1 [1, 3, 500, 500] -> (800, 800) | 61891.9 | 4257.2 Times are in microseconds (us). [--------------- upsample_bicubic2d channels_first non-contiguous ---------------] | 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544 1 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) | 6116.7 | 2357.0 [1, 3, 320, 320] -> (512, 512) | 24182.0 | 9213.9 [1, 3, 500, 500] -> (256, 256) | 6349.6 | 2358.5 [1, 3, 500, 500] -> (800, 800) | 59365.2 | 22431.2 6 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) | 7155.1 | 464.6 [1, 3, 320, 320] -> (512, 512) | 24566.8 | 1712.4 [1, 3, 500, 500] -> (256, 256) | 7217.5 | 466.6 [1, 3, 500, 500] -> (800, 800) | 59880.2 | 4148.8 Times are in microseconds (us). [------------------------ upsample_bicubic2d channels_last non-contiguous -------------------------] | 1.9.0a0+git8518b0e | 1.9.0a0+gite3a9544 | opencv 4.5.1 1 threads: ----------------------------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) | 6184.3 | 2360.0 | 215.0 [1, 3, 320, 320] -> (512, 512) | 24499.7 | 9231.1 | 510.7 [32, 128, 64, 64] -> (32, 32) | 548304.5 | 93517.8 | [32, 128, 64, 64] -> (128, 128) | 7810958.3 | 1086334.6 | [2, 128, 64, 46] -> (32, 32) | 10883.4 | 5594.9 | [2, 128, 64, 46] -> (128, 128) | 153253.2 | 57071.2 | [1, 128, 64, 46] -> (32, 32) | 4519.4 | 2826.5 | 619.7 [1, 128, 64, 46] -> (128, 128) | 61339.7 | 28470.7 | 3654.5 [1, 3, 500, 500] -> (256, 256) | 6444.8 | 2389.9 | 292.9 [1, 3, 500, 500] -> (800, 800) | 59448.0 | 22479.1 | 1316.9 6 threads: ----------------------------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) | 6370.1 | 464.9 | 61.3 [1, 3, 320, 320] -> (512, 512) | 25365.6 | 1767.5 | 145.7 [32, 128, 64, 64] -> (32, 32) | 502888.7 | 22016.3 | [32, 128, 64, 64] -> (128, 128) | 8072918.9 | 234567.0 | [2, 128, 64, 46] -> (32, 32) | 11171.4 | 1049.5 | [2, 128, 64, 46] -> (128, 128) | 152612.5 | 11264.8 | [1, 128, 64, 46] -> (32, 32) | 4359.3 | 791.4 | 651.1 [1, 128, 64, 46] -> (128, 128) | 61346.5 | 7563.9 | 3765.2 [1, 3, 500, 500] -> (256, 256) | 6644.4 | 469.7 | 77.4 [1, 3, 500, 500] -> (800, 800) | 59947.2 | 4154.3 | 313.2 Times are in microseconds (us). Intermediate benchmark sources: - results/20212303-061238_pth_nightly_results_1.9.0a0+git8518b0e.log.save.opencv - results/20212303-061238_pr_results_1.9.0a0+gite3a9544.log.save.opencv ``` [Source file](https://raw.githubusercontent.com/vfdev-5/interpolate-tensoriterator/master/step_seven/results/20212303-061238_pr_1.9.0a0%2Bgite3a9544_vs_pth_1.9.0a0%2Bgit8518b0e_results.opencv.md) </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/54500 Reviewed By: glaringlee Differential Revision: D27463566 Pulled By: fmassa fbshipit-source-id: ceac3a8cee0eeb1a4ddd9344accffcc65449a49a
Author
Parents
Loading