[C10d][UCC] Retain CUDA context in progress_loop (#121446)
UCC requires CUDA context be present, while `progress_loop` https://github.com/pytorch/pytorch/blob/f61192b014647947b52371bb03368b66694f5a34/torch/csrc/distributed/c10d/ProcessGroupUCC.cpp#L333 runs on the side thread and it does not have context present (even though it sets the device).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121446
Approved by: https://github.com/kwen2501