Kill process in `wait_for_process` if `SIGINT` fails to terminate it (#105625)
#98035 adds some additional logic `wait_for_process` that includes catching a timeout exception and sending `SIGINT` to the process before waiting on it again with a timeout. However, if the additional wait times out again, then the wait call in the `finally` block (which does not have a timeout) has the potential to hang indefinitely.
This PR kills the process if a second timeout exception occurs after the `SIGINT` signal is sent.
CC @clee2000 @ptrblck @xwang233 @kwen2501
Also hoping that this has the potential to reduce turnaround time for distributed timeouts like those seen in https://hud.pytorch.org/pr/pytorch/pytorch/105274#15148799113
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105625
Approved by: https://github.com/ezyang