Use store based barrier in init_process_group. (#49419)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49419
As described in https://github.com/pytorch/pytorch/issues/48110, the
newly introduced `barrier()` in `init_process_group` messes up NCCL
communicator state since it uses a bunch of default devices to perform an
allreduce which simulates a barrier(). As a ressult, subsequent NCCL operations
might not behave as expected.
ghstack-source-id: 118861776
Test Plan:
1) unit test added.
2) waitforbuildbot
Reviewed By: mrshenli
Differential Revision: D25566550
fbshipit-source-id: ab083b67b634d7c515f4945deb228f959b27c936