Run clang-format on torch/lib/c10d (#25382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25382
The formatted code swapped the inclusion order around in
ProcessGroupNCCLTest.cpp, causing a compilation failure in
`ATen/cuda/CUDAMultiStreamGuard.h`.
To fix this, this commit also includes a fix to the include list in
`ATen/cuda/CUDAMultiStreamGuard.h`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25382
Test Plan: Imported from OSS
Differential Revision: D17152634
Pulled By: pietern
fbshipit-source-id: c7b74d65a10dce5d602a98dc23fe2810235f932d