Commit
2 years ago
Add LLAMA (#1446) Summary: Fixes https://github.com/pytorch/benchmark/issues/1443 Notable things I had to do - ezyang hopefully these are OK * Since LLAMA requires special permission to download weights and checkpoints and the tokenizer, I went ahead with random checkpoints and random tokenizer - not sure CI qualifies as a valid research endeavour * I removed the dependency on fairscale so had to make a few adjustments like turning ParallelLinear into Linear or ParallelEmbedding into Embedding and things mostly seem to work fine. And added bonus is you can run the example on a single machine * Inference in the code using torch inference mode, I removed it since it has a weird interaction with torch.compile * The open source LLAMA repo is inference only so there is no training support in this script Some other things I can improve in an another PR * Better configuration including sequence length and batching * Reenabling distributed support with FAIRSCALE I can run the code now ``` (bench) ubuntu@ip-172-31-39-186:~/benchmark$ python run.py llama -d cuda Running eval method from llama on cuda in eager mode with input batch size 32. GPU Time: 10.006 milliseconds CPU Total Wall Time: 10.045 milliseconds ``` Pull Request resolved: https://github.com/pytorch/benchmark/pull/1446 Reviewed By: msaroufim Differential Revision: D43960031 Pulled By: xuzhao9 fbshipit-source-id: 05d58ff0c92080542a16433ab3eb550322525152
Author
Parents
Loading