benchmark
e6ef04b4 - Fix GPU utilization issue of timm_vovnet and timm_vision_transformer (#540)

Commit
4 years ago
Fix GPU utilization issue of timm_vovnet and timm_vision_transformer (#540) Summary: # timm_vovnet ## Eval ### batch size analysis best batch size is 32 <google-sheets-html-origin> Batch Size | GPU Time | CPU Dispatch Time | Walltime | GPU Delta -- | -- | -- | -- | -- 1 | 17.213 | 4.315 | 17.22 | - 2 | 20.551 | 4.945 | 20.557 | 0.1939231976 4 | 28.301 | 4.143 | 28.306 | 0.3771106029 8 | 39.55 | 4.112 | 39.558 | 0.3974771209 16 | 81.468 | 5.151 | 81.479 | 1.059873578 32 | 136.573 | 4.812 | 136.582 | 0.6764005499 64 | 267.579 | 4.988 | 267.61 | 0.9592379167 128 | 517.368 | 5.122 | 517.383 | 0.9335149619 256 | 1031.46 | 4.877 | 1031.477 | 0.9936679501 ### non-idleness analysis ![image](https://user-images.githubusercontent.com/502017/139758563-089e129c-8d74-416b-abb4-da11ce42d93c.png) ## Train ## batch size analysis best batch size is 32 <google-sheets-html-origin> Batch Size | GPU Time | CPU Dispatch Time | Walltime | GPU Delta -- | -- | -- | -- | -- 1 | 17.231 | 5.002 | 17.237 | - 2 | 20.536 | 4.695 | 20.544 | 0.1918054669 4 | 28.237 | 4.201 | 28.244 | 0.375 8 | 39.555 | 4.197 | 39.561 | 0.400821617 16 | 81.449 | 4.037 | 81.454 | 1.059132853 32 | 136.489 | 4.261 | 136.501 | 0.6757602917 64 | 267.643 | 4.774 | 267.651 | 0.960912601 128 | 517.897 | 5.962 | 517.922 | 0.9350291246 256 | 1033.853 | 4.978 | 1033.87 | 0.9962521505 ## non-idleness analysis ![image](https://user-images.githubusercontent.com/502017/139758640-6c73ffff-8b5c-4778-ba4e-983f7505d94d.png) # timm_vision_transformer ## Eval ### batch size analysis best batch size is 8 <google-sheets-html-origin> Batch Size | GPU Time | CPU Dispatch Time | Walltime | GPU Delta -- | -- | -- | -- | -- 1 | 10.215 | 3.965 | 10.224 | - 2 | 17.475 | 4.088 | 17.481 | 0.7107195301 4 | 34.131 | 3.773 | 34.137 | 0.9531330472 8 | 52.965 | 4.156 | 52.972 | 0.5518150655 16 | 100.916 | 4.076 | 100.926 | 0.9053337109 32 | 194.475 | 5.722 | 194.492 | 0.9270977843 64 | 407.979 | 5.627 | 407.995 | 1.097848052 128 | 854.217 | 4.718 | 854.24 | 1.093776886 256 | 1552.927 | 5.087 | 1552.939 | 0.8179537518 ### non-idleness analysis ![image](https://user-images.githubusercontent.com/502017/139758773-bac00099-dd3d-412f-bd5a-fb600fc47ac3.png) ## Train ## batch size analysis best batch size is 8 <google-sheets-html-origin> Batch Size | GPU Time | CPU Dispatch Time | Walltime | GPU Delta -- | -- | -- | -- | -- 1 | 10.181 | 3.925 | 10.188 | - 2 | 17.433 | 4.207 | 17.442 | 0.712307239 4 | 34.12 | 4.076 | 34.129 | 0.9572075948 8 | 52.987 | 4.203 | 53.007 | 0.5529601407 16 | 100.975 | 5.067 | 100.988 | 0.9056561043 32 | 194.353 | 4.299 | 194.377 | 0.9247635553 64 | 416.867 | 4.474 | 416.89 | 1.144896143 128 | 876.685 | 4.464 | 876.706 | 1.103032862 256 | 1590.112 | 4.566 | 1590.116 | 0.8137780389 ## non-idleness analysis ![image](https://user-images.githubusercontent.com/502017/139758843-1a0502be-45a2-4067-bbe1-fe5e4ad22781.png) Pull Request resolved: https://github.com/pytorch/benchmark/pull/540 Reviewed By: aaronenyeshi Differential Revision: D32104631 Pulled By: xuzhao9 fbshipit-source-id: cd917330a5c60d65f58fdeb2d160ccd3cc4d2343
Author
Parents
Loading