All Gather and gather APIs for Python Objects (#42189)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42189
Rehash of https://github.com/pytorch/pytorch/pull/28811, which was several months old.
As part of addressing https://github.com/pytorch/pytorch/issues/23232, this PR adds support for the following APIs:
`allgather_object` and `gather_object` to support gather/allgather of generic, pickable Python objects. This has been a long-requested feature so PyTorch should provide these helpers built-in.
The methodology is what is proposed in the original issue:
1) Pickle object to ByteTensor using torch.save
2) Comm. tensor sizes
3) Copy local ByteTensor into a tensor of maximal size
4) Call tensor-based collectives on the result of (3)
5) Unpickle back into object using torch.load
Note that the API is designed to match other than supporting `async_op`. For now, it is a blocking call. If we see demand to support `async_op`, we will have to make more progress on merging work/future to support this.
If this is a suitable approach, we can support `scatter`, `broadcast` in follow up PRs.
ghstack-source-id: 109322433
Reviewed By: mrshenli
Differential Revision: D22785387
fbshipit-source-id: a265a44ec0aa3aaffc3c6966023400495904c7d8