scatter_object_list API for c10d (#43930)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43930
Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks.
The implementation approach follows a similar approach as https://github.com/pytorch/pytorch/pull/42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter.
Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object.
Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this.
It only works for Gloo because NCCL doesn't support scatter.
ghstack-source-id: 117904065
Reviewed By: mrshenli
Differential Revision: D23430686
fbshipit-source-id: f033b89cd82dadd194f2b036312a98423449c26b