Add userbenchmark metrics to the distributed benchmark (#954)
Summary:
Test to run on AWS Cluster (8xA100 GPU):
```
python run_benchmark.py distributed --ngpus 8 --partition train --job_dir $PWD/.userbenchmark/distributed/logs
```
Output metrics json file:
```
{
"name": "distributed",
"environ": {
"pytorch_git_version": "367ce697da444978ab49ed7426c1ffee57d1e88b"
},
"args": {
"ngpus": 8,
"nodes": 1,
"timeout": 1440,
"profiler": false,
"partition": "train",
"job_dir": "/data/home/xzhao9/benchmark/.userbenchmark/distributed/logs",
"model": "torchbenchmark.e2e_models.hf_bert.Model",
"trainer": "torchbenchmark.util.distributed.ddp.DDPTrainer",
"dist_url": "file:///data/home/xzhao9/benchmark/.userbenchmark/distributed/logs/300741a6a31248b5af287dad433b9200_init",
"output_dir": "/data/home/xzhao9/benchmark/.userbenchmark/distributed/logs"
},
"metrics": {
"0-fwd_mean": 18.587260818481447,
"0-fwd_stdev": 1.249975396652668,
"0-bwd_mean": 22.979004859924316,
"0-bwd_stdev": 0.17279135941652868,
"0-opt_mean": 17.845138931274413,
"0-opt_stdev": 0.10701804960748133,
"1-fwd_mean": 14.643673610687255,
"1-fwd_stdev": 0.12621190012201905,
"1-bwd_mean": 31.684182357788085,
"1-bwd_stdev": 1.6110578055095222,
"1-opt_mean": 12.702611064910888,
"1-opt_stdev": 0.11127842204111327,
"2-fwd_mean": 13.7170880317688,
"2-fwd_stdev": 0.3072370404256818,
"2-bwd_mean": 33.75406379699707,
"2-bwd_stdev": 1.2665815412703216,
"2-opt_mean": 11.773264026641845,
"2-opt_stdev": 0.08454682507453214,
"3-fwd_mean": 13.955654430389405,
"3-fwd_stdev": 0.28808249829358984,
"3-bwd_mean": 33.470188522338866,
"3-bwd_stdev": 1.2093597218183398,
"3-opt_mean": 11.838454341888427,
"3-opt_stdev": 0.13043112330328627,
"4-fwd_mean": 14.69996166229248,
"4-fwd_stdev": 2.096843783502309,
"4-bwd_mean": 33.06722240447998,
"4-bwd_stdev": 2.384150208768099,
"4-opt_mean": 11.545113658905029,
"4-opt_stdev": 0.17722284388649193,
"5-fwd_mean": 14.81980791091919,
"5-fwd_stdev": 0.1175716373064526,
"5-bwd_mean": 32.68328037261963,
"5-bwd_stdev": 1.3603798665482199,
"5-opt_mean": 11.662390422821044,
"5-opt_stdev": 0.06255000873065251,
"6-fwd_mean": 14.148975849151611,
"6-fwd_stdev": 0.10553216426892825,
"6-bwd_mean": 33.048796844482425,
"6-bwd_stdev": 1.3651520012370102,
"6-opt_mean": 12.033942317962646,
"6-opt_stdev": 0.039871346805974116,
"7-fwd_mean": 14.514809608459473,
"7-fwd_stdev": 0.7432848123236139,
"7-bwd_mean": 31.55668125152588,
"7-bwd_stdev": 1.5674374937909337,
"7-opt_mean": 13.237116622924805,
"7-opt_stdev": 1.2275919701313986
}
}
```
This metrics output can be used to update the internal performance metrics dashboard to track performance.
Pull Request resolved: https://github.com/pytorch/benchmark/pull/954
Reviewed By: mrshenli
Differential Revision: D37078673
Pulled By: xuzhao9
fbshipit-source-id: 694c9939b1caaaa567629fedd63d08e3016970e8