DeepSpeed
b7565400 - disables ZeRO checkpoint loading path when stage=0 (#7586)

Commit
93 days ago
disables ZeRO checkpoint loading path when stage=0 (#7586) Fixes #7571 When ZeRO is disabled (stage 0) and bf16 is enabled, the current guard sets `load_zero_checkpoint=True`, which leads to `_load_zero_checkpoint` and `_restore_from_bit16_weights()` being called even though no ZeRO state exists. This PR removes the `self.bfloat16_enabled()` condition so that load_zero_checkpoint is tied strictly to `self.zero_optimization()`. Stage 0 (BF16/FP16/FP32): cleanly skips ZeRO checkpoint path. Stage ≥ 1: loads ZeRO partitioned optimizer state as before. cc @sfc-gh-truwase Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Parents
Loading