Fix GPU utilization issue of vgg16 (#553)
Summary:
# Eval
## Batch scaling analysis
<google-sheets-html-origin>
Batch Size | GPU Time | CPU Dispatch Time | Walltime | GPU Delta
-- | -- | -- | -- | --
1 | 13.322 | 4.303 | 13.326 | -
2 | 21.55 | 3.821 | 21.553 | 0.6176249812
4 | 37.348 | 3.841 | 37.352 | 0.7330858469
8 | 71.241 | 4.125 | 71.254 | 0.9074916997
16 | 153.317 | 3.934 | 153.325 | 1.152089387
32 | 291.51 | 4.678 | 291.51 | 0.9013547095
64 | 631.886 | 5.115 | 631.904 | 1.167630613
128 | 1138.609 | 4.618 | 1138.614 | 0.8019215491
best bs=4
## Non-idleness analysis

# Train
## Batch scaling analysis
<google-sheets-html-origin>
Batch Size | GPU Time | CPU Dispatch Time | Walltime | GPU Delta
-- | -- | -- | -- | --
1 | 328.723 | 133.978 | 328.722 | -
2 | 422.562 | 168.257 | 422.571 | 0.2854652702
4 | 609.029 | 261.781 | 609.038 | 0.4412772564
8 | 995.391 | 409.359 | 995.402 | 0.6343901522
16 | 1820.458 | 792.342 | 1820.451 | 0.8288873418
32 | 3368.939 | 1438.867 | 3368.897 | 0.8505996843
64 | 6409.849 | 2821.981 | 6409.767 | 0.9026313626
128 | 12717.985 | 5453.158 | 12717.686 | 0.9841317635
best bs=8
## Non-idleness analysis

STABLE_TEST_MODEL: vgg16
Pull Request resolved: https://github.com/pytorch/benchmark/pull/553
Reviewed By: aaronenyeshi
Differential Revision: D32286355
Pulled By: xuzhao9
fbshipit-source-id: 0d1b2a9dde007726691995c26db3ab617ce7dd61