benchmark
488d2071 - Fix train batch size for attention model. (#593)

Commit

4 years ago

Fix train batch size for attention model. (#593) Summary: The original paper link: https://arxiv.org/pdf/1706.03762.pdf Original hardware platform: 8xNvidia P100 Original batch size: 25000 (tokens) per source and target The reference implementation uses a smaller batch of 256 tokens, source: - https://github.com/jadore801120/attention-is-all-you-need-pytorch/blob/132907dd272e2cc92e3c10e6c4e783a87ff8893d/README.md?plain=1#L83 Pull Request resolved: https://github.com/pytorch/benchmark/pull/593 Reviewed By: aaronenyeshi Differential Revision: D32729595 Pulled By: xuzhao9 fbshipit-source-id: 5d30f4db7a6ffe4f2700a5a35928f6b66163568c

Author

xuzhao9

Committer

facebook-github-bot

Parents

121ea023

benchmark 488d2071 - Fix train batch size for attention model. (#593)

benchmark
488d2071 - Fix train batch size for attention model. (#593)