pytorch
83513506 - poll for timed out futures in process group agent (#29601)

Commit
5 years ago
poll for timed out futures in process group agent (#29601) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29601 Follow up from https://github.com/pytorch/pytorch/pull/28392. Adds a background thread to `ProcessGroupAgent` that polls for timed out RPCs at a pre-set interval, and marks them as completed with a timeout exception if they have timed out. Also deletes the futures from the corresponding maps `futures_` and `futureTimeouts`. Unit tests are added to ensure that timed out RPCs are appropriately cleaned up. Also adds a `shutdown` variable to process group agent to control the shutting down of this background thread, which can eventually be extended to use for controlling a clean shutdown of process group agent. ghstack-source-id: 94175131 Test Plan: Added unit tests Differential Revision: D18434215 fbshipit-source-id: c48abdb8759fe1447200ec66bb9d4b1c50ec4535
Author
Parents
Loading