SemanticDiff

pytorch
bbedfd91 - Run an dummy rpc._all_gather in init_rpc to avoid shutdown timeout (#59801)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

3 years ago

Run an dummy rpc._all_gather in init_rpc to avoid shutdown timeout (#59801) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59801 Fixes https://github.com/pytorch/pytorch/issues/59795. The RPC calls in shutdown no longer able to finish within 5s if there is no other RPCs before `rpc.shutdown()` in that process, because agent initialization can take longer than 5s. We don't have this problem previously, because TensorPipe's backend registry used to use RPC to communicate CUDA devices in `init_rpc`. However, after #58753, `init_rpc` uses ProcessGroup to communicate devices, and hence the channels/transport could be uninitialized after `init_rpc`. Differential Revision: D29039238 D29039238 Test Plan: Imported from OSS Reviewed By: rohan-varma Pulled By: mrshenli fbshipit-source-id: 46f89b01a058a51d271ddef9084a67b220a067b7

Author

mrshenli

mrshenli

Committer

facebook-github-bot

facebook-github-bot

Parents

FAQ Terms Privacy Refunds Impressum

Loading