DeepSpeed
a85b6e47 - Fix bug where ZeRO2 never uses the reduce method. (#4946)

Commit
1 year ago
Fix bug where ZeRO2 never uses the reduce method. (#4946) On this PR https://github.com/microsoft/DeepSpeed/pull/4695, the gradient synchronization operation is moved to the `allreduce_bucket` method, but on this method, rank is set to None, and it will never use the reduce method even if `use_multi_rank_bucket_allreduce` is set to False. Co-authored-by: jializheng <jializheng@huawei.com>
Author
Parents
Loading