Fix GPU utilization issue of alexnet model (#555)
Summary:
# Eval
## Batch size analysis
<google-sheets-html-origin>
Batch Size | GPU Time | CPU Dispatch Time | Walltime | GPU Delta
-- | -- | -- | -- | --
1 | 2.853 | 1.844 | 2.857 | -
2 | 3.039 | 2.159 | 3.05 | 0.06519453207
4 | 4.43 | 2.173 | 4.434 | 0.4577163541
8 | 6.015 | 2.483 | 6.019 | 0.3577878104
16 | 9.807 | 2.562 | 9.809 | 0.6304239401
32 | 17.844 | 2.646 | 17.849 | 0.8195166718
64 | 29.399 | 2.715 | 29.404 | 0.6475566017
128 | 57.571 | 2.921 | 57.583 | 0.9582638865
best bs=16
## Profiling

# Train
## Batch size analysis
<google-sheets-html-origin>
Batch Size | GPU Time | CPU Dispatch Time | Walltime | GPU Delta
-- | -- | -- | -- | --
1 | 96.571 | 19.375 | 96.619 | -
2 | 106.011 | 24.047 | 106.036 | 0.0977519131
4 | 113.002 | 22.854 | 113.014 | 0.06594598674
8 | 131.276 | 24.553 | 131.301 | 0.161713952
16 | 168.657 | 22.563 | 168.7 | 0.2847512112
32 | 238.722 | 22.459 | 238.748 | 0.4154289475
64 | 372.982 | 24.158 | 373.005 | 0.5624115079
128 | 657.214 | 25.023 | 657.246 | 0.7620528605
best bs=64
## Profiling

STABLE_TEST_MODEL: alexnet
Pull Request resolved: https://github.com/pytorch/benchmark/pull/555
Reviewed By: zou3519, aaronenyeshi
Differential Revision: D32294514
Pulled By: xuzhao9
fbshipit-source-id: 85bf27c75373ec90929241696939830f34926da6