[Zero2] Reduce the unnecessary all-reduce when tensor size is 0. (#5868)
When running for Zero2, the reduce_bucket_size we set is not large
enough, the self.elements_in_ipg_bucket will be 0, then in function
average_tensor the input is the tensor with size=0
https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/zero/stage_1_and_2.py#L1372
use reduce_scatter can be WA
https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/zero/stage_1_and_2.py#L1066
if user uses the reduce_scatter=false, in function
gradient_reduction_w_predivide will meet the unnecessary all-reduce with
tensor size is 0.
https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/zero/stage_1_and_2.py#L974
This pr is to add the judgement to reduce this unnecessary all-reduce.
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>