Add support for NCCL alltoall (#44374)
Summary:
In https://github.com/pytorch/pytorch/issues/42514, NCCL `alltoall_single` is already added. This PR adds NCCL `alltoall`.
The difference between `alltoall_single` and `alltoall` is: `alltoall_single` works on a single tensor and send/receive slices of that tensor, while `alltoall` works on a list of tensor, and send/receive tensors in that list.
cc: ptrblck ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44374
Reviewed By: zhangguanheng66, mrshenli
Differential Revision: D24455427
Pulled By: srinivas212
fbshipit-source-id: 42fdebdd14f8340098e2c34ef645bd40603552b1