[16/N] Add _allgather_base custom op with CPU/CUDA implementation (#88889)
Differential Revision: [D41227739](https://our.internmc.facebook.com/intern/diff/D41227739)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88889
Approved by: https://github.com/kwen2501