Fix GPU utilization issue of timm_vovnet and timm_vision_transformer (#540)
Summary:
# timm_vovnet
## Eval
### batch size analysis
best batch size is 32
<google-sheets-html-origin>
Batch Size | GPU Time | CPU Dispatch Time | Walltime | GPU Delta
-- | -- | -- | -- | --
1 | 17.213 | 4.315 | 17.22 | -
2 | 20.551 | 4.945 | 20.557 | 0.1939231976
4 | 28.301 | 4.143 | 28.306 | 0.3771106029
8 | 39.55 | 4.112 | 39.558 | 0.3974771209
16 | 81.468 | 5.151 | 81.479 | 1.059873578
32 | 136.573 | 4.812 | 136.582 | 0.6764005499
64 | 267.579 | 4.988 | 267.61 | 0.9592379167
128 | 517.368 | 5.122 | 517.383 | 0.9335149619
256 | 1031.46 | 4.877 | 1031.477 | 0.9936679501
### non-idleness analysis

## Train
## batch size analysis
best batch size is 32
<google-sheets-html-origin>
Batch Size | GPU Time | CPU Dispatch Time | Walltime | GPU Delta
-- | -- | -- | -- | --
1 | 17.231 | 5.002 | 17.237 | -
2 | 20.536 | 4.695 | 20.544 | 0.1918054669
4 | 28.237 | 4.201 | 28.244 | 0.375
8 | 39.555 | 4.197 | 39.561 | 0.400821617
16 | 81.449 | 4.037 | 81.454 | 1.059132853
32 | 136.489 | 4.261 | 136.501 | 0.6757602917
64 | 267.643 | 4.774 | 267.651 | 0.960912601
128 | 517.897 | 5.962 | 517.922 | 0.9350291246
256 | 1033.853 | 4.978 | 1033.87 | 0.9962521505
## non-idleness analysis

# timm_vision_transformer
## Eval
### batch size analysis
best batch size is 8
<google-sheets-html-origin>
Batch Size | GPU Time | CPU Dispatch Time | Walltime | GPU Delta
-- | -- | -- | -- | --
1 | 10.215 | 3.965 | 10.224 | -
2 | 17.475 | 4.088 | 17.481 | 0.7107195301
4 | 34.131 | 3.773 | 34.137 | 0.9531330472
8 | 52.965 | 4.156 | 52.972 | 0.5518150655
16 | 100.916 | 4.076 | 100.926 | 0.9053337109
32 | 194.475 | 5.722 | 194.492 | 0.9270977843
64 | 407.979 | 5.627 | 407.995 | 1.097848052
128 | 854.217 | 4.718 | 854.24 | 1.093776886
256 | 1552.927 | 5.087 | 1552.939 | 0.8179537518
### non-idleness analysis

## Train
## batch size analysis
best batch size is 8
<google-sheets-html-origin>
Batch Size | GPU Time | CPU Dispatch Time | Walltime | GPU Delta
-- | -- | -- | -- | --
1 | 10.181 | 3.925 | 10.188 | -
2 | 17.433 | 4.207 | 17.442 | 0.712307239
4 | 34.12 | 4.076 | 34.129 | 0.9572075948
8 | 52.987 | 4.203 | 53.007 | 0.5529601407
16 | 100.975 | 5.067 | 100.988 | 0.9056561043
32 | 194.353 | 4.299 | 194.377 | 0.9247635553
64 | 416.867 | 4.474 | 416.89 | 1.144896143
128 | 876.685 | 4.464 | 876.706 | 1.103032862
256 | 1590.112 | 4.566 | 1590.116 | 0.8137780389
## non-idleness analysis

Pull Request resolved: https://github.com/pytorch/benchmark/pull/540
Reviewed By: aaronenyeshi
Differential Revision: D32104631
Pulled By: xuzhao9
fbshipit-source-id: cd917330a5c60d65f58fdeb2d160ccd3cc4d2343