benchmark
b6b19a9f - Support DDP in the core model set (#1031)

Commit
3 years ago
Support DDP in the core model set (#1031) Summary: Example command: ``` python run_benchmark.py distributed --ngpus 8 --nodes 1 --model torchbenchmark.models.hf_Bert.Model --trainer torchbenchmark.util.distributed.core_model.trainer.Trainer --distributed ddp --job_dir $PWD/.userbenchmark/distributed/logs_eager ``` Output: ``` { "name": "distributed", "environ": { "pytorch_git_version": "5728ca13aef459e71cee062eb872ab217dfa5742" }, "args": { "ngpus": 8, "nodes": 1, "timeout": 1440, "profiler": false, "partition": "train", "cluster": null, "job_dir": "/data/home/xzhao9/benchmark/.userbenchmark/distributed/logs_eager", "model": "torchbenchmark.models.hf_Bert.Model", "trainer": "torchbenchmark.util.distributed.core_model.trainer.Trainer", "distributed": "ddp", "dist_url": "file:///data/home/xzhao9/benchmark/.userbenchmark/distributed/logs_eager/f4960836279846a88e9bee2202fb226e_init", "output_dir": "/data/home/xzhao9/benchmark/.userbenchmark/distributed/logs_eager" }, "metrics": { "0-latency_median": 375.4787841796875, "0-latency_stdev": 0.5074592125167986, "1-latency_median": 375.49486389160154, "1-latency_stdev": 0.5890401056068594, "2-latency_median": 375.4880035400391, "2-latency_stdev": 0.5707920820804092, "3-latency_median": 375.48769226074216, "3-latency_stdev": 0.5835954020419492, "4-latency_median": 375.4671112060547, "4-latency_stdev": 0.49707192777934556, "5-latency_median": 375.49219970703126, "5-latency_stdev": 0.5600655620421927, "6-latency_median": 375.4905609130859, "6-latency_stdev": 0.5482310737142803, "7-latency_median": 375.4790863037109, "7-latency_stdev": 0.5190043980938861 } } ``` Example command 2: ``` python run_benchmark.py distributed --ngpus 8 --nodes 1 --model torchbenchmark.models.hf_Bert.Model --trainer torchbenchmark.util.distributed.core_model.trainer.Trainer --distributed ddp --torchdynamo aot_autograd_speedup_strategy --job_dir $PWD/.userbenchmark/distributed/logs_torchdynamo ``` Output: ``` { "name": "distributed", "environ": { "pytorch_git_version": "5728ca13aef459e71cee062eb872ab217dfa5742" }, "args": { "ngpus": 8, "nodes": 1, "timeout": 1440, "profiler": false, "partition": "train", "cluster": null, "job_dir": "/data/home/xzhao9/benchmark/.userbenchmark/distributed/logs_torchdynamo", "model": "torchbenchmark.models.hf_Bert.Model", "trainer": "torchbenchmark.util.distributed.core_model.trainer.Trainer", "distributed": "ddp", "dist_url": "file:///data/home/xzhao9/benchmark/.userbenchmark/distributed/logs_torchdynamo/92b3d2fbd2984cf3aa620e790edf280d_init", "output_dir": "/data/home/xzhao9/benchmark/.userbenchmark/distributed/logs_torchdynamo" }, "metrics": { "0-latency_median": 362.51340637207034, "0-latency_stdev": 2.6765847673834227, "1-latency_median": 362.52191162109375, "1-latency_stdev": 2.6585795546458573, "2-latency_median": 362.55426330566405, "2-latency_stdev": 2.641738105760016, "3-latency_median": 362.5112548828125, "3-latency_stdev": 2.6668644262567915, "4-latency_median": 362.547509765625, "4-latency_stdev": 2.6403463499953577, "5-latency_median": 362.5440246582031, "5-latency_stdev": 3.224301047544871, "6-latency_median": 362.5065460205078, "6-latency_stdev": 2.6760710391195124, "7-latency_median": 362.5503784179688, "7-latency_stdev": 2.6452712921359853 } } ``` Pull Request resolved: https://github.com/pytorch/benchmark/pull/1031 Reviewed By: FindHao Differential Revision: D37888709 Pulled By: xuzhao9 fbshipit-source-id: fd145185c12a65eb41de8bd7ee34984b09c904e0
Author
Parents
Loading