[Model Averaging] Enforce a synchronization before allreduce parameters (#60891)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60891
This fix is particularly useful for local SGD when the averaging period is very small, which may cause the conflict between gradient allreduce within per-machine subgroup and the global parameter allreduce by the communication world.
ghstack-source-id: 132564252
Test Plan:
f281873295 (#Try1) failed due to the conflict between global process group and subgroup.
```
<Thread(configerator-monitor-singleton, started 139839806633728)>
File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/tmp/jetter.gson7tr3/configerator/client.py", line 348, in _monitor_loop
self._parent_thread.join(self._interval_ms / 1000)
File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 1015, in join
self._wait_for_tstate_lock(timeout=max(timeout, 0))
File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
```
Fixed after adding an explicit sync: f282044866, f282241800
Reviewed By: rohan-varma
Differential Revision: D29434597
fbshipit-source-id: a4f777fc26f379639f85fda32de425cd3b337b33