[futures] Add torch.futures.collect_all()/wait_all() python api. (#39790)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39790
The "[fut.wait() for fut in futs]" idiom can introduce up to
O(len(futs)) thread switches, which may be excessive for large N.
This plumbs through the new c++ c10::collectAll() to Python space
so that we only employ a single jit-side wait.
ghstack-source-id: 105779443
Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:rpc_spawn
Reviewed By: kiukchung
Differential Revision: D21976891
fbshipit-source-id: 253c61f503f4ffb9be784e6c49a0656cede139fb