Fix flaky NCCL error handling tests. (#42149)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42149
Some of these tests were flaky since we could kill the process in some
way without cleaning up the ProcessGroup. This resulted in issues where the
FileStore didn't clean up appropriately resulting in other processes in the
group to crash.
Fixed this by explicitly deleting the process_group before we bring a process
down forcibly.
ghstack-source-id: 108629057
Test Plan: waitforbuildbot
Reviewed By: mrshenli
Differential Revision: D22785042
fbshipit-source-id: c31d0f723badbc23b7258e322f75b57e0a1a42cf