[torch][elastic] Make final agent barrier to shutdown properly
Summary:
When workers finish their work TE agent will start `synchronize_barrier` procedure. The barrier will wait for other agents at the end of the execution.
There is a race condition may happen: The barrier uses TCPStore which is located on Rank0. When Rank0 finishes the work, other ranks may still be in a process of executing `get_all` method. This means that some of them will fail because the TCPStore will be destroyed.
The fix adds additional check on Rank0 process: Rank0 process now waits for all other ranks to finish before terminating the process.
Test Plan: unit tests
Differential Revision: D35227180
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74931
Approved by: https://github.com/kiukchung