Fix attention_is_all_you_need_pytorch model (#562)
Summary:
# Eval
## Batch size
<google-sheets-html-origin>
Batch Size | GPU Time | CPU Dispatch Time | Walltime | GPU Delta
-- | -- | -- | -- | --
1 | 15.988 | 15.91 | 15.996 | -
2 | 18.003 | 17.915 | 18.013 | 0.126032024
4 | 17.858 | 17.782 | 17.87 | -0.008054213187
8 | 16.121 | 16.037 | 16.131 | -0.09726733117
16 | 18.769 | 18.129 | 18.772 | 0.1642578004
32 | 30.616 | 18.463 | 30.619 | 0.6312003836
64 | 57.229 | 18.543 | 57.234 | 0.8692513718
128 | 143.149 | 19.444 | 143.166 | 1.501336735
best bs = 32
## Profiling

# Train
## Batch size
<google-sheets-html-origin>
Batch Size | GPU Time | CPU Dispatch Time | Walltime | GPU Delta
-- | -- | -- | -- | --
1 | 60.928 | 54.223 | 60.932 | -
2 | 61.402 | 54.986 | 61.405 | 0.00777967437
4 | 74.171 | 68.628 | 74.175 | 0.2079573955
8 | 68.061 | 61.104 | 68.064 | -0.08237720942
16 | 77.905 | 65.653 | 77.908 | 0.1446349598
32 | 110.292 | 97.916 | 110.293 | 0.4157242796
64 | 207.236 | 195.138 | 207.234 | 0.8789758097
128 | 377.571 | 371.196 | 377.564 | 0.8219373082
best bs=64
## Profiling

STABLE_TEST_MODEL: attention_is_all_you_need_pytorch
Pull Request resolved: https://github.com/pytorch/benchmark/pull/562
Reviewed By: aaronenyeshi
Differential Revision: D32410513
Pulled By: xuzhao9
fbshipit-source-id: 25a9c71b44a4ad0d46f90e78266254414951f9d7