benchmark
a6843e77 - dist.init_process_group only if it's not already init'd (#1499)

Commit
2 years ago
dist.init_process_group only if it's not already init'd (#1499) Summary: Otherwise, initializing the model twice in the same python process will fail with ``` Traceback (most recent call last): File "/fsx/users/janeyx/conda/envs/torchbenchmark/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/fsx/users/janeyx/conda/envs/torchbenchmark/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/scratch/janeyx/work/benchmark/userbenchmark/optim/__init__.py", line 406, in <module> run(sys.argv[1:]) File "/scratch/janeyx/work/benchmark/userbenchmark/optim/__init__.py", line 397, in run results = run_benchmarks(args.optims, args.funcs, args.models, args.devices) File "/scratch/janeyx/work/benchmark/userbenchmark/optim/__init__.py", line 336, in run_benchmarks bm = run_model(mn, d, O, defaults, func_str) File "/scratch/janeyx/work/benchmark/userbenchmark/optim/__init__.py", line 313, in run_model raise e File "/scratch/janeyx/work/benchmark/userbenchmark/optim/__init__.py", line 288, in run_model params = get_model_params(modelName, device) File "/scratch/janeyx/work/benchmark/userbenchmark/optim/__init__.py", line 240, in get_model_params params = _get_model_params(Model(device=device, test='train')) File "/scratch/janeyx/work/benchmark/torchbenchmark/util/model.py", line 20, in __call__ obj = type.__call__(cls, *args, **kwargs) File "/scratch/janeyx/work/benchmark/torchbenchmark/models/torchrec_dlrm/__init__.py", line 46, in __init__ dist.init_process_group(backend=backend) File "/fsx/users/janeyx/conda/envs/torchbenchmark/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 131, in wrapper return func(*args, **kwargs) File "/fsx/users/janeyx/conda/envs/torchbenchmark/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 902, in init_process_group raise RuntimeError("trying to initialize the default process group " "twice!") ``` For optim benchmarking, we will init the model twice (once for cpu, once for cuda) and run into this error. This shouldn't cause failures. Pull Request resolved: https://github.com/pytorch/benchmark/pull/1499 Reviewed By: xuzhao9 Differential Revision: D44274202 Pulled By: janeyx99 fbshipit-source-id: c3c64396cc448fa6f514a29088b72e7b89ae973b
Author
Parents
Loading