[c10d] PGNCCL refactor part 2: Simplify ProcessGroupNCCL into single-device style (#119421)
Part 2 and last part of #118674:
Introduce actual "single-device" code change to ProcessGroupNCCL.
assert size == 1 and test refactor have been done in #119099.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119421
Approved by: https://github.com/shuqiangzhang