Add LLAMA (#1446)
Summary:
Fixes https://github.com/pytorch/benchmark/issues/1443
Notable things I had to do - ezyang hopefully these are OK
* Since LLAMA requires special permission to download weights and checkpoints and the tokenizer, I went ahead with random checkpoints and random tokenizer - not sure CI qualifies as a valid research endeavour
* I removed the dependency on fairscale so had to make a few adjustments like turning ParallelLinear into Linear or ParallelEmbedding into Embedding and things mostly seem to work fine. And added bonus is you can run the example on a single machine
* Inference in the code using torch inference mode, I removed it since it has a weird interaction with torch.compile
* The open source LLAMA repo is inference only so there is no training support in this script
Some other things I can improve in an another PR
* Better configuration including sequence length and batching
* Reenabling distributed support with FAIRSCALE
I can run the code now
```
(bench) ubuntu@ip-172-31-39-186:~/benchmark$ python run.py llama -d cuda
Running eval method from llama on cuda in eager mode with input batch size 32.
GPU Time: 10.006 milliseconds
CPU Total Wall Time: 10.045 milliseconds
```
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1446
Reviewed By: msaroufim
Differential Revision: D43960031
Pulled By: xuzhao9
fbshipit-source-id: 05d58ff0c92080542a16433ab3eb550322525152