ddp+dynamo: assorted fixes (#1223)

Commit

3 years ago

ddp+dynamo: assorted fixes (#1223) Summary: Pull Request resolved: https://github.com/pytorch/benchmark/pull/1223 * fix requeue, so that when a job gets preempted, the job can be resubmitted. * add ADAM_CAPTURABLE to force using the capturable optimizer * Assert that torch.version.debug == False, because debug=True is bad for benchmarking and particularly bad for NCCL performance for some reason Example of output: with `--csv_out`: ``` $ python userbenchmark/ddp_experiments/parse_ddp.py --results_dir logs --csv ddp_experiments_20221010-214124.csv --csv_out model,has_ddp_breaks,backend,1-node torchbenchmark.models.hf_T5.Model,False,torchdynamo_inductor,340.344 torchbenchmark.models.resnet50.Model,False,torchdynamo_inductor,34.228 torchbenchmark.models.hf_T5.Model,True,torchdynamo_inductor,342.207 torchbenchmark.models.resnet50.Model,True,torchdynamo_inductor,33.529 ``` without `--csv_out`: ``` $ python userbenchmark/ddp_experiments/parse_ddp.py --results_dir logs --csv ddp_experiments_20221010-214124.csv hf_T5: backend 1_latency ------------------------------ ----------- torchdynamo_inductor wo/breaks 340.344 torchdynamo_inductor w/breaks 342.207 resnet50: backend 1_latency ------------------------------ ----------- torchdynamo_inductor wo/breaks 34.228 torchdynamo_inductor w/breaks 33.529 ``` Test Plan: Imported from OSS Reviewed By: xuzhao9 Differential Revision: D40248168 Pulled By: davidberard98 fbshipit-source-id: c5ac1bf01a60bead4df372525a89e64badb4c148

Author

davidberard98

Committer

facebook-github-bot

Parents

9ca9f58e

benchmark f320cda1 - ddp+dynamo: assorted fixes (#1223)

benchmark
f320cda1 - ddp+dynamo: assorted fixes (#1223)