Fix GPU utilization issue for resnet18 (#549)
Summary:
# Eval
## Batch size analysis
<google-sheets-html-origin>
Batch Size | GPU Time | CPU Dispatch Time | Walltime | GPU Delta
-- | -- | -- | -- | --
1 | 6.831 | 6.234 | 6.836 | -
2 | 8.134 | 7.741 | 8.138 | 0.1907480603
4 | 7.701 | 7.086 | 7.704 | -0.05323334153
8 | 12.355 | 9.24 | 12.362 | 0.6043370991
16 | 23.613 | 7.167 | 23.618 | 0.9112100364
32 | 43.685 | 7.124 | 43.698 | 0.8500402321
64 | 81.349 | 8.187 | 81.362 | 0.8621723704
128 | 157.381 | 8.117 | 157.389 | 0.9346396391
best bs = 8
## Non-idleness analysis

# Train
## Batch size analysis
<google-sheets-html-origin>
Batch Size | GPU Time | CPU Dispatch Time | Walltime | GPU Delta
-- | -- | -- | -- | --
1 | 63.873 | 59.249 | 63.877 | -
2 | 83.961 | 69.599 | 83.974 | 0.3144990841
4 | 116.895 | 77.58 | 116.897 | 0.3922535463
8 | 171.702 | 115.767 | 171.714 | 0.4688566662
16 | 289.856 | 192.566 | 289.855 | 0.6881340928
32 | 492.707 | 327.83 | 492.707 | 0.6998337105
64 | 974.338 | 648.937 | 974.34 | 0.9775201083
128 | 1879.526 | 1254.384 | 1879.521 | 0.9290287354
best bs=16
## Non-idleness analysis

STABLE_TEST_MODEL: resnet18
Pull Request resolved: https://github.com/pytorch/benchmark/pull/549
Reviewed By: aaronenyeshi
Differential Revision: D32286385
Pulled By: xuzhao9
fbshipit-source-id: 80868292dff7de7876d1d0ee878f1535d46f53c6