Add distributed correctness checks (#1294)
Summary:
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1294
DDP+Dynamo experiments run in different processes, and DDP also occurs after the general correctness check - so they need to be handled separately.
Process:
1) When setting up the measurements in `ddp_experiments/__init__.py`, categorize some measurements as "reference" (e.g. the eager measurements) and others as "test". Also assign a file path.
2) For reference measurements, do an initial correctness measurement and dump the results into a file. For test measurements, load the reference measurements from the file and compare the initial correctness measurements.
3) Make sure to run the correctness measurement on all ranks, even if we're only doing the correctness check on rank 0.
4) Make sure to disable non-distributed correctness checks to avoid an additional iteration that might affect dynamo and eager parameters differently.
Currently hf_T5, hf_T5_large, hf_Bert, hf_GPT2_large, and timm_vision_transformer are passing; ~resnet50 is failing. Still investigating the resnet50 issue.~ resnet50 is also failing correctness with `python run.py resnet50 -t train -d cuda --torchdynamo inductor`, so this isn't a DDP-specific problem.
Usage:
```
python userbenchmark/ddp_experiments/__init__.py --job_dir /fsx/path/to/dir/shared/across/cluster --check_correctness_distributed
```
i.e. adding --check_correctness_distributed will add correctness checks. Note that the correctness checks here shouldn't be used if you care about performance; we modify the models with model.eval() to get rid of dropouts, so it's not representative of actual performance. Since we probably don't care about performance anyway when using this option, we also reduce the number of iterations to speed up the test time.
Test Plan: Imported from OSS
Reviewed By: wconstab
Differential Revision: D41312110
Pulled By: davidberard98
fbshipit-source-id: c393923e2eac89209418abde5acb897fd382b6ba