Thread PG: add allreduce to threaded pg (#89043)
Summary:
Goal
Add `all_reduce` collective to multi-threaded ProcessGroup added in D40236769 (https://github.com/pytorch/pytorch/commit/6663ae5537f3c61030ba4d425bd57a097c51430a).
Code Motion
Added `allreduce` collective to ProcessLocalGroup (a subclass of c10d ProcessGroup).
What's Next
Add a DDP test utilizing the new allreduce op.
Generalize `allreduce` to allow other `ReduceOp`s besides `SUM`.
Test Plan:
cd fbcode/caffe2
buck2 test mode/dev //caffe2/test/distributed:multi_threaded
Differential Revision: D41046606
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89043
Approved by: https://github.com/wanchaol