Fix GPU utilization issue of mobilenet_v3_large (#543)
Summary:
Fixes https://github.com/pytorch/benchmark/issues/451
# mobilenet_v3_large
## Eval
### batch size analysis
best batch size is 32
<google-sheets-html-origin>
Batch Size | GPU Time | CPU Dispatch Time | Walltime | GPU Delta
-- | -- | -- | -- | --
1 | 10.052 | 9.976 | 10.07 | -
2 | 10.365 | 10.296 | 10.378 | 0.03113808197
4 | 8.466 | 8.401 | 8.477 | -0.1832127352
8 | 9.462 | 8.322 | 9.466 | 0.1176470588
16 | 16.02 | 8.317 | 16.025 | 0.693088142
32 | 28.324 | 8.51 | 28.33 | 0.7680399501
64 | 54.698 | 9.421 | 54.704 | 0.9311537918
128 | 106.29 | 8.436 | 106.298 | 0.9432154741
256 | 210.956 | 9.009 | 210.97 | 0.9847210462
### non-idleness analysis (bs=32)

## Train
## batch size analysis
best batch size is 32
<google-sheets-html-origin>
Batch Size | GPU Time | CPU Dispatch Time | Walltime | GPU Delta
-- | -- | -- | -- | --
1 | 8.467 | 8.39 | 8.48 | -
2 | 8.582 | 8.515 | 8.592 | 0.01358214244
4 | 8.343 | 8.281 | 8.354 | -0.02784898625
8 | 10.209 | 10.139 | 10.222 | 0.2236605538
16 | 16.046 | 8.469 | 16.051 | 0.5717504163
32 | 28.318 | 9.517 | 28.323 | 0.7648011966
64 | 54.681 | 8.826 | 54.685 | 0.9309626386
128 | 106.291 | 8.803 | 106.341 | 0.9438378962
256 | 210.822 | 10.797 | 210.834 | 0.9834416837
## non-idleness analysis (bs=32)

Pull Request resolved: https://github.com/pytorch/benchmark/pull/543
Reviewed By: aaronenyeshi
Differential Revision: D32139005
Pulled By: xuzhao9
fbshipit-source-id: ce626baab0f22854b21e8f007d421751b49ac46d