DeepSpeed
24d1d86b - [Zero2] Reduce the unnecessary all-reduce when tensor size is 0. (#5868)

Commit

1 year ago

[Zero2] Reduce the unnecessary all-reduce when tensor size is 0. (#5868) When running for Zero2, the reduce_bucket_size we set is not large enough, the self.elements_in_ipg_bucket will be 0, then in function average_tensor the input is the tensor with size=0 https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/zero/stage_1_and_2.py#L1372 use reduce_scatter can be WA https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/zero/stage_1_and_2.py#L1066 if user uses the reduce_scatter=false, in function gradient_reduction_w_predivide will meet the unnecessary all-reduce with tensor size is 0. https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/zero/stage_1_and_2.py#L974 This pr is to add the judgement to reduce this unnecessary all-reduce. Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

References

#5868 - [Zero2] Reduce the unnecessary all-reduce when tensor size is 0.

Author

ys950902

Parents

862aff37

DeepSpeed 24d1d86b - [Zero2] Reduce the unnecessary all-reduce when tensor size is 0. (#5868)

DeepSpeed
24d1d86b - [Zero2] Reduce the unnecessary all-reduce when tensor size is 0. (#5868)