[distributed] Handle object collectives and NCCL. (#79034)
This fixes all object collectives under NCCL and adds some automated tests for them.
This PR *does not* fix sending tensors using object collectives.
It simplifies device handling by computing the appropriate one earlier and then ensuring all tensor ops happen on it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79034
Approved by: https://github.com/rohan-varma