[c10d] Make alltoall as a custom op (#79691)
Summary:
This patch makes alltoall as a custom op such that it's dispatcher
passable. It's one part of the effort to route comm ops to the dispatcher
such that tracing mechanisms that relies on the dispatcher can trace them,
e.g., LazyTensor and AOTAutograd.
Test Plan:
BACKEND=nccl WORLD_SIZE=2 python test/distributed/test_distributed_spawn.py -v TestDistBackendWithSpawn.test_all_to_all_cuda
BACKEND=nccl WORLD_SIZE=2 python test/distributed/test_distributed_spawn.py -v TestDistBackendWithSpawn.test_all_to_all_cuda_complex
BACKEND=nccl WORLD_SIZE=2 python test/distributed/test_distributed_spawn.py -v TestDistBackendWithSpawn.test_all_to_all_full_group_cuda
and other existing distributed tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79691
Approved by: https://github.com/mrshenli, https://github.com/wanchaol