Gloo-only CPU-based monitored barrier (#53773)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53773
Closes https://github.com/pytorch/pytorch/issues/52876
Implements a barrier by doing send/recv to rank 0, and rank 0 waits for these requests and on timeout, throws an exception indicating which rank did not join in the given timeout.
This barrier is only intended for CPU use cases and built into process group gloo, and will be used for debugging synchronization/hang issues.
Test Plan: Added UT
Reviewed By: zhaojuanmao
Differential Revision: D26921357
fbshipit-source-id: 7c16e861b4b8ea2bdd67a36b3de7b1029af7d173