DeepSpeed
be8124c8 - Fix ZeRO stage 1 and add stage 2 support with DeepCompile (#7366)

Commit
187 days ago
Fix ZeRO stage 1 and add stage 2 support with DeepCompile (#7366) This PR fixes the behavior of DeepCompile's ZeRO stage 1 and adds stage 2 support. DeepCompile's ZeRO1 currently performs allreduce at every iteration even when it is not a gradient accumulation boundary. This significantly slows down the performance when gradient accumulation is enabled. This PR fixes this issue by performing allreduce only at the gradient accumulation boundary. As the current behavior is similar to ZeRO2, this PR also adds DeepCompile's ZeRO2 support. We can now set zero stage to 2 with DeepCompile. The loss values, performance, and memory usages were verified using this [verification tool](https://github.com/tohtana/ds_verify_loss) ([results](https://github.com/tohtana/ds_verify_loss/blob/main/results/results_20250617_035117/report.md)). --------- Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Author
Parents
Loading