[DDP] Merge work and future_work in reducer (#58937)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58937
Remove `work` attribute from Reducer class in favor of `future_work`.
Additionally, remove `copy_grad_to_bucket` method since now it's only one-line implementation, and created a new C++ comm hook called `_AllReduceCommHookWithDivFactor` to replace allreduce and also support handling uneven input.
#Original PR Issue: https://github.com/pytorch/pytorch/issues/41266
ghstack-source-id: 130673249
Test Plan:
buck test caffe2/test/distributed:distributed_gloo_fork -- test_accumulate_gradients_no_sync
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_accumulate_gradients_no_sync
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs
Reviewed By: agolynski
Differential Revision: D28677383
fbshipit-source-id: 85e0620378b7e9d837e436e94b9d807631d7d752