benchmark
cdd87f0d - Add new model: simple_gpt_tp_manual (#1969)

Commit

2 years ago

Add new model: simple_gpt_tp_manual (#1969) Summary: Similar to simple_gpt, but instead of using the DTensor API to apply Tensor Parallelism (TP), we use the manual weights sharding implementation and directly functional collectives. 2 main reasons it is beneficial to add this: 1. DTensor + compile is not ready yet 2. DTensor has a CPU overhead, and adding this less overhead model will help us track the improvement/regression Tests: in benchmark/ python test.py -k "test_simple_gpt_manual_tp_" in pytorch/ PYTHONPATH=benchmark/ python pytorch/benchmarks/dynamo/torchbench.py --float16 -dcuda --inference --backend=inductor --multiprocess --performance --only simple_gpt_tp_manual Pull Request resolved: https://github.com/pytorch/benchmark/pull/1969 Reviewed By: xuzhao9 Differential Revision: D50130401 Pulled By: xmfan fbshipit-source-id: cd4b5e543919024ff6c42c6fccfc0b12367d9bb2

Author

xmfan

Committer

facebook-github-bot

Parents

fd7f14e1

benchmark cdd87f0d - Add new model: simple_gpt_tp_manual (#1969)

benchmark
cdd87f0d - Add new model: simple_gpt_tp_manual (#1969)