pytorch
4541f603 - Gloo-only CPU-based monitored barrier (#53773)

Commit
3 years ago
Gloo-only CPU-based monitored barrier (#53773) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53773 Closes https://github.com/pytorch/pytorch/issues/52876 Implements a barrier by doing send/recv to rank 0, and rank 0 waits for these requests and on timeout, throws an exception indicating which rank did not join in the given timeout. This barrier is only intended for CPU use cases and built into process group gloo, and will be used for debugging synchronization/hang issues. Test Plan: Added UT Reviewed By: zhaojuanmao Differential Revision: D26921357 fbshipit-source-id: 7c16e861b4b8ea2bdd67a36b3de7b1029af7d173
Author
Parents
Loading