Fix GPU utilization issue of mnasnet1_0 model (#548)
Summary:
# Eval
## Batch size scaling analysis
<google-sheets-html-origin>
Batch Size | GPU Time | CPU Dispatch Time | Walltime | GPU Delta
-- | -- | -- | -- | --
1 | 10.641 | 10.578 | 10.651 | -
2 | 14.103 | 14.029 | 14.125 | 0.3253453623
4 | 12.626 | 12.562 | 12.637 | -0.1047294902
8 | 13.09 | 13.025 | 13.099 | 0.03674956439
16 | 17.455 | 13.184 | 17.46 | 0.333460657
32 | 31.719 | 13.723 | 31.724 | 0.8171870524
64 | 62.722 | 15.37 | 62.729 | 0.9774267789
128 | 124.82 | 13.802 | 124.824 | 0.9900513376
32 is the best bs
## Non-idleness analysis

# Train
## Batch size scaling analysis
<google-sheets-html-origin>
Batch Size | GPU Time | CPU Dispatch Time | Walltime | GPU Delta
-- | -- | -- | -- | --
1 | 117.221 | 116.833 | 117.224 | -
2 | 128.477 | 126.024 | 128.492 | 0.09602375001
4 | 153.92 | 151.295 | 153.919 | 0.198035446
8 | 192.844 | 190.238 | 192.844 | 0.2528846154
16 | 271.544 | 268.974 | 271.543 | 0.4081018855
32 | 446.159 | 443.637 | 446.152 | 0.6430449577
64 | 790.263 | 788.029 | 790.25 | 0.7712586768
128 | 1497.106 | 1494.919 | 1497.075 | 0.8944402053
32 is the best bs
## Non-idleness analysis

STABLE_TEST_MODEL: mnasnet1_0
Pull Request resolved: https://github.com/pytorch/benchmark/pull/548
Reviewed By: aaronenyeshi
Differential Revision: D32286394
Pulled By: xuzhao9
fbshipit-source-id: ff46220c9f944d3c173827e72fd4d7c20717a62e