Fix train batch size for attention model. (#593)
Summary:
The original paper link: https://arxiv.org/pdf/1706.03762.pdf
Original hardware platform: 8xNvidia P100
Original batch size: 25000 (tokens) per source and target
The reference implementation uses a smaller batch of 256 tokens, source:
- https://github.com/jadore801120/attention-is-all-you-need-pytorch/blob/132907dd272e2cc92e3c10e6c4e783a87ff8893d/README.md?plain=1#L83
Pull Request resolved: https://github.com/pytorch/benchmark/pull/593
Reviewed By: aaronenyeshi
Differential Revision: D32729595
Pulled By: xuzhao9
fbshipit-source-id: 5d30f4db7a6ffe4f2700a5a35928f6b66163568c