benchmark
0a64c552 - Add DTensor LLaMA inference model: simple_gpt (#1867)

Commit
2 years ago
Add DTensor LLaMA inference model: simple_gpt (#1867) Summary: Adds simple_gpt + DTensor implemented in https://github.com/pytorch-labs/simple_gpt/pull/7 to torchbench Tested via `python benchmarks/dynamo/torchbench.py -d cuda --output-directory=benchmark_logs --output=performance.csv --inference --performance --timing --print-memory --multiprocess --nothing --only simple_gpt`. Note: --nothing is used here to disable compile, since DTensor + compile isn't yet supported in main ``` dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks cuda,simple_gpt,1,0.966153,196.819773,-0.059319,1.000000,4.576880,4.576880,0,0,0,0 cuda,simple_gpt,1,0.967389,196.608152,-0.058833,1.000000,4.577404,4.577404,0,0,0,0 cuda,simple_gpt,1,0.973152,196.093583,-0.059316,1.000000,4.593133,4.593133,0,0,0,0 cuda,simple_gpt,1,0.973087,196.124046,-0.075580,1.000000,4.611483,4.611483,0,0,0,0 cuda,simple_gpt,1,0.967908,193.998484,-0.040192,1.000000,4.593133,4.593133,0,0,0,0 cuda,simple_gpt,1,0.968949,193.798088,-0.028878,1.000000,4.593133,4.593133,0,0,0,0 ``` 2 changes were required to the model: - decorate torch.no_grad() on the caches, previously this was done outside the model, the entire eval call was wrapped in a torch.no_grad() context. After using torchbench, I notice even with only inference mode, we don't disable gradient calculations - rank/world size, added support from torchbench side in https://github.com/pytorch/pytorch/pull/108438 and updated model to fetch from the provided extra_args Pull Request resolved: https://github.com/pytorch/benchmark/pull/1867 Reviewed By: msaroufim Differential Revision: D49065244 Pulled By: xmfan fbshipit-source-id: d4709fa3997c6a25c75e87eff7c13492b370b1af
Author
Parents
Loading