[NCCL] Add torch::cuda::nccl::send/recv (#45926)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45926
torch/csrc/cuda/nccl.cpp is compiled as part of torch_cuda library and thus by calling this function from ProcessGroupNCCCL.cpp it avoids linking 2nd instance of libnccl.a into torch_python
Fixes similiar issue as https://github.com/pytorch/pytorch/issues/42517
ghstack-source-id: 113910530
Test Plan: waitforsandcastle
Reviewed By: jiayisuse
Differential Revision: D24147802
fbshipit-source-id: d8901fdb31bdc22ddca2364f8050844639a1beb3