[torch distributed] Implementing reduce_scatter_base (#57567)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57567
Support flattened reduce_scatter.
Test Plan:
buck test mode/opt -c fbcode.enable_gpu_sections=true //caffe2/torch/lib/c10d:ProcessGroupNCCLTest
buck test mode/opt -c fbcode.enable_gpu_sections=true //caffe2/test/distributed:c10d
Reviewed By: zhaojuanmao
Differential Revision: D27876281
fbshipit-source-id: 58e2edfb1baff5cdc083dbaaba9f19502ef0b298