ddp+dynamo: assorted fixes (#1223)
Summary:
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1223
* fix requeue, so that when a job gets preempted, the job can be
resubmitted.
* add ADAM_CAPTURABLE to force using the capturable optimizer
* Assert that torch.version.debug == False, because debug=True is bad
for benchmarking and particularly bad for NCCL performance for some
reason
Example of output:
with `--csv_out`:
```
$ python userbenchmark/ddp_experiments/parse_ddp.py --results_dir logs --csv ddp_experiments_20221010-214124.csv --csv_out
model,has_ddp_breaks,backend,1-node
torchbenchmark.models.hf_T5.Model,False,torchdynamo_inductor,340.344
torchbenchmark.models.resnet50.Model,False,torchdynamo_inductor,34.228
torchbenchmark.models.hf_T5.Model,True,torchdynamo_inductor,342.207
torchbenchmark.models.resnet50.Model,True,torchdynamo_inductor,33.529
```
without `--csv_out`:
```
$ python userbenchmark/ddp_experiments/parse_ddp.py --results_dir logs --csv ddp_experiments_20221010-214124.csv
hf_T5:
backend 1_latency
------------------------------ -----------
torchdynamo_inductor wo/breaks 340.344
torchdynamo_inductor w/breaks 342.207
resnet50:
backend 1_latency
------------------------------ -----------
torchdynamo_inductor wo/breaks 34.228
torchdynamo_inductor w/breaks 33.529
```
Test Plan: Imported from OSS
Reviewed By: xuzhao9
Differential Revision: D40248168
Pulled By: davidberard98
fbshipit-source-id: c5ac1bf01a60bead4df372525a89e64badb4c148