Add nanogpt_train to torchbench, and merge with nanogpt_generate into joint folder (#1911)
Summary:
Currently, we have nanogpt_generate, which is the inference only version of https://github.com/karpathy/nanoGPT/blob/master/sample.py. This PR adds the training component, simplified from https://github.com/karpathy/nanoGPT/blob/master/train.py.
### Tests
```
unit tests:
> python test.py -k "test_nanogpt_"
pass
for accuracy check:
> python benchmarks/dynamo/torchbench.py --float16 -dcuda --training --backend=inductor --ddp --multiprocess --accuracy --collect-outputs --only nanogpt
pass
for perf measurement:
> python benchmarks/dynamo/torchbench.py --float16 -dcuda --training --backend=inductor --ddp --multiprocess --performance --only nanogpt
1.547x
```
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1911
Reviewed By: xuzhao9
Differential Revision: D49510254
Pulled By: xmfan
fbshipit-source-id: b9029b93b4eaafa2d50265232340fdf99f0226d9