Reinstate ncclCommDestroy (#17943)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17943
Together with xw285cornell came up with a solution for static destruction
order fiasco that caused the NCCL context to be destroyed **after**
the CUDA context was already destroyed. In this commit we destroy all
cached NCCL contexts as soon as the last NCCL related Caffe2 operator
instance is destructed, thereby avoiding a dependency on static
variable destruction.
Reviewed By: xw285cornell
Differential Revision: D14429724
fbshipit-source-id: fe5ce4b02b1002af8d9f57f6fa089b7a80e316ce