benchmark
76e464cc - Revert changes to distributed/trainer.py from #1135 (#1144)

Commit

3 years ago

Revert changes to distributed/trainer.py from #1135 (#1144) Summary: Pull Request resolved: https://github.com/pytorch/benchmark/pull/1144 distributed/trainer.py is for e2e models, and distributed/core_model/trainer.py already exists which works with core models. Verified with: ``` python run_benchmark.py distributed --ngpus 2 --nodes 1 --model torchbenchmark.e2e_models.hf_bert.Model --trainer torchbenchmark.util.distributed.trainer.Trainer --distributed ddp --job_dir $PWD/.userbenchmark/distributed/e2e_hf_bert ``` (this one failed with `AssertionError: Only one thread should be running DDP at a time`, but I believe this is a known ddp issue) and ``` python run_benchmark.py distributed --ngpus 2 --nodes 1 --model torchbenchmark.models.resnet50.Model --trainer torchbenchmark.util.distributed.core_model.trainer.Trainer --distributed ddp --job_dir $PWD/.userbenchmark/distributed/resnet50 ``` Test Plan: Imported from OSS Reviewed By: xuzhao9 Differential Revision: D39152488 Pulled By: davidberard98 fbshipit-source-id: 3e89f76c9cc44505fd548f95eb55f902a43e3d61

Author

davidberard98

Committer

facebook-github-bot

Parents

c678454d

benchmark 76e464cc - Revert changes to distributed/trainer.py from #1135 (#1144)

benchmark
76e464cc - Revert changes to distributed/trainer.py from #1135 (#1144)