pytorch
6663ae55 - [2/n] Thread PG: add class _World to distributed_c10d.py (#781) (#88471)

Commit

2 years ago

[2/n] Thread PG: add class _World to distributed_c10d.py (#781) (#88471) Summary: X-link: https://github.com/pytorch/torchrec/pull/781 Move a bunch of globals to instance methods and replace all use to them. We move all PG related globals under World and use a singleton instance under _world. This creates an undocumented extension point to inject full control of how how c10d state behaves. One simple hack is to change _world to an implementation that uses a threadlocal and enable per-thread PGs. It almost get DDP working and the PG is missing an implementation of all_reduce. This enables notebook usage of PTD, which is a big deal for learning it: https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68 This change ensures BC by keeping the global variables around and have the default _World wrap it. I have relinked this diff to a new github PR, so that I can update it. The original PR is > Pull Request resolved: https://github.com/pytorch/pytorch/pull/86348 Differential Revision: D40236769 Pulled By: yhcharles Pull Request resolved: https://github.com/pytorch/pytorch/pull/88471 Approved by: https://github.com/gnadathur, https://github.com/rohan-varma

Author

Rodrigo Kumpera

Committer

pytorchmergebot

Parents

fc8f2f66

pytorch 6663ae55 - [2/n] Thread PG: add class _World to distributed_c10d.py (#781) (#88471)

pytorch
6663ae55 - [2/n] Thread PG: add class _World to distributed_c10d.py (#781) (#88471)