[11/N] Update all_to_all with CPU/CUDA implementations (#86407)
* #83916 [7/N] [Dispatchable Collectives] Update reduce with CPU / CUDA implementations
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86407
Approved by: https://github.com/kwen2501