Improve the GPU utilization for timm_nfnet and timm_efficientnet (#502)
Summary:
# Testing Environment
Environment: GCP box
GPU: Nvidia Telsa T4 (16G)
# Timm_nfnet
## Batch Size Scalability Evaluation
<meta charset="utf-8"><div dir="ltr" style="margin-left:0pt;" align="left">
Batch Size | GPU Time (ms) | CPU Time (ms) | Time Increase to last BS | GPU speedup
-- | -- | -- | -- | --
8 | GPU Time: 70.618 millisecondsCPU Dispatch Time: 14.068 millisecondsCPU Total Wall Time: 70.633 milliseconds | CPU Total Wall Time: 728.016 milliseconds | 18% | 10.4x
16 | GPU Time: 97.711 millisecondsCPU Dispatch Time: 14.114 millisecondsCPU Total Wall Time: 97.720 milliseconds | CPU Total Wall Time: 1212.985 milliseconds | 38% | 12.4x
<b>32</b> | GPU Time: 166.798 millisecondsCPU Dispatch Time: 14.493 millisecondsCPU Total Wall Time: 166.810 milliseconds | CPU Total Wall Time: 2698.890 milliseconds | 71% | 16.1x
64 | GPU Time: 326.841 millisecondsCPU Dispatch Time: 14.729 millisecondsCPU Total Wall Time: 326.861 milliseconds | CPU Total Wall Time: 5427.835 milliseconds | 96% | 16.6x
</div>
The best bs is 32.
## Batch Size Scalability Train
Batch Size | GPU Time (ms) | CPU Time (ms) | Time Increase to last BS | GPU speedup
-- | -- | -- | -- | --
8 | GPU Time: 219.862 millisecondsCPU Dispatch Time: 202.673 millisecondsCPU Total Wall Time: 219.868 milliseconds | CPU Total Wall Time: 2107.537 milliseconds | - |
16 | GPU Time: 330.576 millisecondsCPU Dispatch Time: 313.206 millisecondsCPU Total Wall Time: 330.577 milliseconds | CPU Total Wall Time: 3786.673 milliseconds | 50% | 11.47x
32 | GPU Time: 566.907 millisecondsCPU Dispatch Time: 549.858 millisecondsCPU Total Wall Time: 566.908 milliseconds | CPU Total Wall Time: 7861.868 milliseconds | 80% | 13.8x
64 | GPU Time: 1066.371 millisecondsCPU Dispatch Time: 1049.278 millisecondsCPU Total Wall Time: 1066.358 milliseconds | CPU Total Wall Time: 17435.319 milliseconds | 88% | 16x
128 | GPU Time: 2007.782 millisecondsCPU Dispatch Time: 1990.756 millisecondsCPU Total Wall Time: 2007.747 milliseconds | N/A | 88% | -
256 | CUDA OOM | N/A | |
The best bs is 16.
# Timm_efficientnet
## Batch Size Scalability Evaluation
Batch Size | GPU Time (ms) | CPU Time (ms) | Time Increase to last BS
-- | -- | -- | --
8 | GPU Time: 27.221 millisecondsCPU Dispatch Time: 12.071 millisecondsCPU Total Wall Time: 27.229 milliseconds | CPU Total Wall Time: 203.290 milliseconds | -
16 | GPU Time: 45.054 millisecondsCPU Dispatch Time: 12.198 millisecondsCPU Total Wall Time: 45.060 milliseconds | CPU Total Wall Time: 461.532 milliseconds | 67%
32 | GPU Time: 80.333 millisecondsCPU Dispatch Time: 14.775 millisecondsCPU Total Wall Time: 80.340 milliseconds | CPU Total Wall Time: 1163.242 milliseconds | 77%
64 | GPU Time: 126.814 millisecondsCPU Dispatch Time: 12.516 millisecondsCPU Total Wall Time: 126.831 milliseconds | CPU Total Wall Time: 2730.520 milliseconds | 57%
128 | GPU Time: 228.803 millisecondsCPU Dispatch Time: 13.312 millisecondsCPU Total Wall Time: 228.808 milliseconds | CPU Total Wall Time: 6111.540 milliseconds | 91%
The best bs is 64.
## Batch Size Scalability Train
Batch Size | GPU Time (ms) | CPU Time (ms) | Time Increase to last BS
-- | -- | -- | --
8 | GPU Time: 104.506 millisecondsCPU Dispatch Time: 103.768 millisecondsCPU Total Wall Time: 104.511 milliseconds | CPU Total Wall Time: 770.408 milliseconds | -
16 | GPU Time: 150.113 millisecondsCPU Dispatch Time: 148.315 millisecondsCPU Total Wall Time: 150.117 milliseconds | CPU Total Wall Time: 1540.900 milliseconds | 44%
32 | GPU Time: 231.292 millisecondsCPU Dispatch Time: 230.643 millisecondsCPU Total Wall Time: 231.295 milliseconds | CPU Total Wall Time: 3098.839 milliseconds | 54%
64 | GPU Time: 445.225 millisecondsCPU Dispatch Time: 444.463 millisecondsCPU Total Wall Time: 445.223 milliseconds | CPU Total Wall Time: 6643.432 milliseconds | 92%
The best bs is 32.
Pull Request resolved: https://github.com/pytorch/benchmark/pull/502
Reviewed By: aaronenyeshi
Differential Revision: D31785712
Pulled By: xuzhao9
fbshipit-source-id: aa6ba559dee51bb248ddb20947367f73fd32e381