Move all nccl from torch_python to torch_cuda (#36193)
Summary:
Because `torch_python` is supposed to be thin wrapper around `torch`
In this PR, all invocation of functions from nccl library are moved from python_nccl.cpp (which is part of torch_python) to nccl.cpp (which is part of torch_cuda)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36193
Test Plan: CI
Differential Revision: D20930047
Pulled By: malfet
fbshipit-source-id: 7f278610077df6ac5dc3471c1a1b5d51e653ef9c