poll for timed out futures in process group agent (#29601)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29601
Follow up from https://github.com/pytorch/pytorch/pull/28392. Adds a background thread to `ProcessGroupAgent` that polls for timed out RPCs at a pre-set interval, and marks them as completed with a timeout exception if they have timed out. Also deletes the futures from the corresponding maps `futures_` and `futureTimeouts`. Unit tests are added to ensure that timed out RPCs are appropriately cleaned up.
Also adds a `shutdown` variable to process group agent to control the shutting down of this background thread, which can eventually be extended to use for controlling a clean shutdown of process group agent.
ghstack-source-id: 94175131
Test Plan: Added unit tests
Differential Revision: D18434215
fbshipit-source-id: c48abdb8759fe1447200ec66bb9d4b1c50ec4535