[c10d] Profiler support for nccl p2p collectives (#56427)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56427
This PR enables support for nccl send/recv profiling similar to how we have it for MPI and Gloo.
The process to do so is similar to the NCCL collectives where we create the `recordingFunction` in `initWork` and then add a callback that runs the profiler end callbacks. Tests are added similar to send/recv tests with gloo/MPI.
We also test with both autograd profiler and torch.profiler.
ghstack-source-id: 128142666
Test Plan: CI
Reviewed By: mrshenli
Differential Revision: D27866600
fbshipit-source-id: f29d9103e22b22f658632fece0df9ba36911fc62