DeepSpeed
24d1d86b - [Zero2] Reduce the unnecessary all-reduce when tensor size is 0. (#5868)

Commit
1 year ago
[Zero2] Reduce the unnecessary all-reduce when tensor size is 0. (#5868) When running for Zero2, the reduce_bucket_size we set is not large enough, the self.elements_in_ipg_bucket will be 0, then in function average_tensor the input is the tensor with size=0 https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/zero/stage_1_and_2.py#L1372 use reduce_scatter can be WA https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/zero/stage_1_and_2.py#L1066 if user uses the reduce_scatter=false, in function gradient_reduction_w_predivide will meet the unnecessary all-reduce with tensor size is 0. https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/zero/stage_1_and_2.py#L974 This pr is to add the judgement to reduce this unnecessary all-reduce. Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Parents
Loading