benchmark
afa744e1 - Improve the GPU utilization for timm_nfnet and timm_efficientnet (#502)

Commit
4 years ago
Improve the GPU utilization for timm_nfnet and timm_efficientnet (#502) Summary: # Testing Environment Environment: GCP box GPU: Nvidia Telsa T4 (16G) # Timm_nfnet ## Batch Size Scalability Evaluation <meta charset="utf-8"><div dir="ltr" style="margin-left:0pt;" align="left"> Batch Size | GPU Time (ms) | CPU Time (ms) | Time Increase to last BS | GPU speedup -- | -- | -- | -- | -- 8 | GPU Time: 70.618 millisecondsCPU Dispatch Time: 14.068 millisecondsCPU Total Wall Time: 70.633 milliseconds | CPU Total Wall Time: 728.016 milliseconds | 18% | 10.4x 16 | GPU Time: 97.711 millisecondsCPU Dispatch Time: 14.114 millisecondsCPU Total Wall Time: 97.720 milliseconds | CPU Total Wall Time: 1212.985 milliseconds | 38% | 12.4x <b>32</b> | GPU Time: 166.798 millisecondsCPU Dispatch Time: 14.493 millisecondsCPU Total Wall Time: 166.810 milliseconds | CPU Total Wall Time: 2698.890 milliseconds | 71% | 16.1x 64 | GPU Time: 326.841 millisecondsCPU Dispatch Time: 14.729 millisecondsCPU Total Wall Time: 326.861 milliseconds | CPU Total Wall Time: 5427.835 milliseconds | 96% | 16.6x </div> The best bs is 32. ## Batch Size Scalability Train Batch Size | GPU Time (ms) | CPU Time (ms) | Time Increase to last BS | GPU speedup -- | -- | -- | -- | -- 8 | GPU Time: 219.862 millisecondsCPU Dispatch Time: 202.673 millisecondsCPU Total Wall Time: 219.868 milliseconds | CPU Total Wall Time: 2107.537 milliseconds | - | 16 | GPU Time: 330.576 millisecondsCPU Dispatch Time: 313.206 millisecondsCPU Total Wall Time: 330.577 milliseconds | CPU Total Wall Time: 3786.673 milliseconds | 50% | 11.47x 32 | GPU Time: 566.907 millisecondsCPU Dispatch Time: 549.858 millisecondsCPU Total Wall Time: 566.908 milliseconds | CPU Total Wall Time: 7861.868 milliseconds | 80% | 13.8x 64 | GPU Time: 1066.371 millisecondsCPU Dispatch Time: 1049.278 millisecondsCPU Total Wall Time: 1066.358 milliseconds | CPU Total Wall Time: 17435.319 milliseconds | 88% | 16x 128 | GPU Time: 2007.782 millisecondsCPU Dispatch Time: 1990.756 millisecondsCPU Total Wall Time: 2007.747 milliseconds | N/A | 88% | - 256 | CUDA OOM | N/A | | The best bs is 16. # Timm_efficientnet ## Batch Size Scalability Evaluation Batch Size | GPU Time (ms) | CPU Time (ms) | Time Increase to last BS -- | -- | -- | -- 8 | GPU Time: 27.221 millisecondsCPU Dispatch Time: 12.071 millisecondsCPU Total Wall Time: 27.229 milliseconds | CPU Total Wall Time: 203.290 milliseconds | - 16 | GPU Time: 45.054 millisecondsCPU Dispatch Time: 12.198 millisecondsCPU Total Wall Time: 45.060 milliseconds | CPU Total Wall Time: 461.532 milliseconds | 67% 32 | GPU Time: 80.333 millisecondsCPU Dispatch Time: 14.775 millisecondsCPU Total Wall Time: 80.340 milliseconds | CPU Total Wall Time: 1163.242 milliseconds | 77% 64 | GPU Time: 126.814 millisecondsCPU Dispatch Time: 12.516 millisecondsCPU Total Wall Time: 126.831 milliseconds | CPU Total Wall Time: 2730.520 milliseconds | 57% 128 | GPU Time: 228.803 millisecondsCPU Dispatch Time: 13.312 millisecondsCPU Total Wall Time: 228.808 milliseconds | CPU Total Wall Time: 6111.540 milliseconds | 91% The best bs is 64. ## Batch Size Scalability Train Batch Size | GPU Time (ms) | CPU Time (ms) | Time Increase to last BS -- | -- | -- | -- 8 | GPU Time: 104.506 millisecondsCPU Dispatch Time: 103.768 millisecondsCPU Total Wall Time: 104.511 milliseconds | CPU Total Wall Time: 770.408 milliseconds | - 16 | GPU Time: 150.113 millisecondsCPU Dispatch Time: 148.315 millisecondsCPU Total Wall Time: 150.117 milliseconds | CPU Total Wall Time: 1540.900 milliseconds | 44% 32 | GPU Time: 231.292 millisecondsCPU Dispatch Time: 230.643 millisecondsCPU Total Wall Time: 231.295 milliseconds | CPU Total Wall Time: 3098.839 milliseconds | 54% 64 | GPU Time: 445.225 millisecondsCPU Dispatch Time: 444.463 millisecondsCPU Total Wall Time: 445.223 milliseconds | CPU Total Wall Time: 6643.432 milliseconds | 92% The best bs is 32. Pull Request resolved: https://github.com/pytorch/benchmark/pull/502 Reviewed By: aaronenyeshi Differential Revision: D31785712 Pulled By: xuzhao9 fbshipit-source-id: aa6ba559dee51bb248ddb20947367f73fd32e381
Author
Parents
Loading