[BE] Fix flaky ProcessGroupGloo tests (#61396)
Summary:
A hypothesis as to why tests such as https://github.com/pytorch/pytorch/issues/57469 may be flaky is due to `c10d = ProcessGroupGloo(...)` is not actually guaranteed to be a synchronization point, so some ranks may create the PG, run all the error checking (which does not actually call into gloo APIs so doesn't require synchronization), and then exit, all before other ranks have created the gloo pg.
This can result in the following error:
```
File "distributed/test_c10d_gloo.py", line 1037, in test_reduce_checks
May 03 06:42:34 pg = c10d.ProcessGroupGloo(store, self.rank, self.world_size, self.opts())
May 03 06:42:34 RuntimeError: [/var/lib/jenkins/workspace/third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [127.0.0.1]:35521
```
which indicates that the remote end has hung up. Furthermore all the flaky tests in this file only do error checking and don't call into the gloo APIs, further indicating that this issue may be the root cause. Not 100% sure this PR will fix it because I haven't been able to actually repro the issue even after 10000+ runs, but it happens regularly in CI.
To fix this, we add a `dist.barrier(group=pg)` call after creating the pg to enforce a synchronization. Would be good to land this and observe whether it helps with the flakiness.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61396
Reviewed By: mrshenli
Differential Revision: D29664189
Pulled By: rohan-varma
fbshipit-source-id: bc046d5d816fe6cb426522b85312383bfa3f90b7