Add a single GPU variant of modded-nanogpt to torchbench (#2660)
Summary:
X-link: https://github.com/pytorch/pytorch/pull/169505
X-link: https://github.com/pytorch/pytorch/pull/169502
## Tests
Standalone: `python -m torchbenchmark.models.modded_nanogpt.main`
Through dynamo benchmarks: `python benchmarks/dynamo/torchbench.py --performance --training --amp --backend inductor --device cuda --only modded_nanogpt --disable-cudagraphs`
This PR adds a tweaked version of the Aug 23rd record for the nanogpt speedrun (GPT-2 small variant): https://github.com/KellerJordan/modded-nanogpt/blob/9d9dc969c451c87b7ad3c84f807db2c2d9109f41/train_gpt.py.
The later records can not be ran without building FA3 from source, so we will ommit them until the dynamo FA3 PR is merged.
The tweaks are to library-ify the script by commenting out everything other than the model class definitions, to change the pg initialization to use fake pg, and constant-ify some hyperparameters.
The tests run locally, but this model specifically requires H100. I wasn't sure how to filter for that, so I skipped all the tests. This will be tested on the dynamo benchmark side: https://github.com/pytorch/pytorch/pull/169449.
Pull Request resolved: https://github.com/pytorch/benchmark/pull/2660
Reviewed By: BoyuanFeng
Differential Revision: D88233265
Pulled By: xmfan
fbshipit-source-id: 6894823c4593e68d048f59fd05a091d67bf03756