Fix Huggingface model issue with distributed (#1189)
Summary:
Fixes https://github.com/pytorch/benchmark/issues/1174
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1189
Test Plan:
```
python run_benchmark.py distributed --ngpus 8 --nodes 1 --model torchbenchmark.models.hf_T5.Model --trainer torchbenchmark.util.distributed.core_model.trainer.Trainer --distributed ddp --job_dir $PWD/.userbenchmark/distributed
```
Output:
```
{
"name": "distributed",
"environ": {
"pytorch_git_version": "0feda8a4ba4c9fc395186686c74152e12cc5c63e"
},
"args": {
"ngpus": 8,
"nodes": 1,
"timeout": 1440,
"profiler": false,
"partition": "train",
"cluster": null,
"job_dir": "/data/home/xzhao9/benchmark/.userbenchmark/distributed/",
"model": "torchbenchmark.models.hf_T5.Model",
"trainer": "torchbenchmark.util.distributed.core_model.trainer.Trainer",
"distributed": "ddp",
"dist_url": "file:///data/home/xzhao9/benchmark/.userbenchmark/distributed/52a9247de6324bd385612e697e077a7c_init",
"output_dir": "/data/home/xzhao9/benchmark/.userbenchmark/distributed/",
"extra_args": []
},
"metrics": {
"0-latency_median": 398.9452362060547,
"0-latency_stdev": 1.2685990462640786,
"1-latency_median": 398.97906494140625,
"1-latency_stdev": 1.2198633774230698,
"2-latency_median": 398.9121856689453,
"2-latency_stdev": 1.0341724509167218,
"3-latency_median": 399.0419158935547,
"3-latency_stdev": 1.1147936411123855,
"4-latency_median": 398.67335510253906,
"4-latency_stdev": 1.1395044277789603,
"5-latency_median": 399.1611328125,
"5-latency_stdev": 1.684352926731195,
"6-latency_median": 399.2419891357422,
"6-latency_stdev": 1.2517432020509627,
"7-latency_median": 398.93389892578125,
"7-latency_stdev": 1.3721978229900935
}
}
```
Reviewed By: davidberard98
Differential Revision: D39595197
Pulled By: xuzhao9
fbshipit-source-id: 677044e9065622277bd81495becbd99a4c90c117