[futures] Reland: Add torch.futures.collect_all()/wait_all() python api. (#39964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39964
The "[fut.wait() for fut in futs]" idiom can introduce up to
O(len(futs)) thread switches, which may be excessive for large N.
This plumbs through the new c++ c10::collectAll() to Python space
so that we only employ a single jit-side wait.
Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:rpc_spawn
Differential Revision: D22027412
fbshipit-source-id: 4e344a19a09638ee46e7fc478df80a41941b84ce