benchmark
26e450c6 - Fix Huggingface model issue with distributed (#1189)

Commit
3 years ago
Fix Huggingface model issue with distributed (#1189) Summary: Fixes https://github.com/pytorch/benchmark/issues/1174 Pull Request resolved: https://github.com/pytorch/benchmark/pull/1189 Test Plan: ``` python run_benchmark.py distributed --ngpus 8 --nodes 1 --model torchbenchmark.models.hf_T5.Model --trainer torchbenchmark.util.distributed.core_model.trainer.Trainer --distributed ddp --job_dir $PWD/.userbenchmark/distributed ``` Output: ``` { "name": "distributed", "environ": { "pytorch_git_version": "0feda8a4ba4c9fc395186686c74152e12cc5c63e" }, "args": { "ngpus": 8, "nodes": 1, "timeout": 1440, "profiler": false, "partition": "train", "cluster": null, "job_dir": "/data/home/xzhao9/benchmark/.userbenchmark/distributed/", "model": "torchbenchmark.models.hf_T5.Model", "trainer": "torchbenchmark.util.distributed.core_model.trainer.Trainer", "distributed": "ddp", "dist_url": "file:///data/home/xzhao9/benchmark/.userbenchmark/distributed/52a9247de6324bd385612e697e077a7c_init", "output_dir": "/data/home/xzhao9/benchmark/.userbenchmark/distributed/", "extra_args": [] }, "metrics": { "0-latency_median": 398.9452362060547, "0-latency_stdev": 1.2685990462640786, "1-latency_median": 398.97906494140625, "1-latency_stdev": 1.2198633774230698, "2-latency_median": 398.9121856689453, "2-latency_stdev": 1.0341724509167218, "3-latency_median": 399.0419158935547, "3-latency_stdev": 1.1147936411123855, "4-latency_median": 398.67335510253906, "4-latency_stdev": 1.1395044277789603, "5-latency_median": 399.1611328125, "5-latency_stdev": 1.684352926731195, "6-latency_median": 399.2419891357422, "6-latency_stdev": 1.2517432020509627, "7-latency_median": 398.93389892578125, "7-latency_stdev": 1.3721978229900935 } } ``` Reviewed By: davidberard98 Differential Revision: D39595197 Pulled By: xuzhao9 fbshipit-source-id: 677044e9065622277bd81495becbd99a4c90c117
Author
Parents
Loading