Fix GPU utilization issue for shufflenet_v2 (#551)
Summary:
# Eval
## Batch scaling analysis
<google-sheets-html-origin>
Batch Size | GPU Time | CPU Dispatch Time | Walltime | GPU Delta
-- | -- | -- | -- | --
1 | 12.282 | 12.219 | 12.293 | -
2 | 15.328 | 15.242 | 15.341 | 0.2480052109
4 | 15.686 | 15.614 | 15.7 | 0.0233559499
8 | 15.836 | 15.764 | 15.847 | 0.009562667347
16 | 15.36 | 15.275 | 15.371 | -0.03005809548
32 | 15.515 | 15.297 | 15.519 | 0.01009114583
64 | 25.573 | 15.644 | 25.596 | 0.6482758621
128 | 50.018 | 15.881 | 50.021 | 0.9558909788
best bs=64
## None-idleness analysis

# Train
## Batch scaling analysis
<google-sheets-html-origin>
Batch Size | GPU Time | CPU Dispatch Time | Walltime | GPU Delta
-- | -- | -- | -- | --
1 | 123.335 | 123.266 | 123.347 | -
2 | 163.121 | 163.031 | 163.136 | 0.3225848299
4 | 130.795 | 130.185 | 130.797 | -0.1981719092
8 | 141.061 | 139.277 | 141.062 | 0.07848923889
16 | 201.191 | 199.659 | 201.19 | 0.4262694863
32 | 242.508 | 240.703 | 242.507 | 0.2053620689
64 | 345.422 | 343.727 | 345.419 | 0.4243736289
128 | 582.909 | 582.8 | 582.916 | 0.6875271407
256 | 1057.891 | 1056.498 | 1057.869 | 0.8148476006
512 | 2046.851 | 2046.696 | 2046.82 | 0.9348411131
best bs = 128
## None-idleness analysis

STABLE_TEST_MODEL: shufflenet_v2_x1_0
Pull Request resolved: https://github.com/pytorch/benchmark/pull/551
Reviewed By: aaronenyeshi
Differential Revision: D32286365
Pulled By: xuzhao9
fbshipit-source-id: 75f7ec69ccc654d3efdc140d02af4fb1cef70232