Run an dummy rpc._all_gather in init_rpc to avoid shutdown timeout (#59801)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59801
Fixes https://github.com/pytorch/pytorch/issues/59795.
The RPC calls in shutdown no longer able to finish within 5s if
there is no other RPCs before `rpc.shutdown()` in that process,
because agent initialization can take longer than 5s. We don't
have this problem previously, because TensorPipe's backend
registry used to use RPC to communicate CUDA devices in `init_rpc`.
However, after #58753, `init_rpc` uses ProcessGroup to communicate
devices, and hence the channels/transport could be uninitialized
after `init_rpc`.
Differential Revision:
D29039238
D29039238
Test Plan: Imported from OSS
Reviewed By: rohan-varma
Pulled By: mrshenli
fbshipit-source-id: 46f89b01a058a51d271ddef9084a67b220a067b7