DeepSpeed
22cf1a44 - Fix(scheduler): WarmupLR inherits optimizer lr when not specified (#7360)

Commit
96 days ago
Fix(scheduler): WarmupLR inherits optimizer lr when not specified (#7360) This PR fixes issue #7303. ### 1. Description of the Bug Currently, when using the `WarmupLR` scheduler, if `warmup_max_lr` is not explicitly set in the scheduler's parameters, it incorrectly falls back to its internal default value (`0.001`), ignoring the learning rate set in the optimizer's parameters. This can lead to unexpected training behavior and diverges from user expectations. ### 2. Description of the Fix This fix modifies the `__init__` method of the `WarmupLR` scheduler in `deepspeed/runtime/lr_schedules.py`. - The default value for the `warmup_max_lr` argument in the function signature is changed from `0.001` to `None`. - Logic is added to check if `warmup_max_lr` is `None` upon initialization. If it is, the scheduler now correctly inherits the learning rate from the optimizer's parameter groups. This change ensures that the optimizer's learning rate is respected as the default `warmup_max_lr`, aligning the scheduler's behavior with the user's configuration intent. ### 3. Verification The fix has been verified using a minimal reproduction script that clearly demonstrates the behavioral change. **Before Fix:** Without `warmup_max_lr` in the scheduler config, the learning rate incorrectly defaults to `0.001`. <img width="1711" alt="Screenshot 2025-06-16 at 18 34 31" src="https://github.com/user-attachments/assets/fe68f39e-2bbc-4f94-b322-546d9ce43bb0" /> **After Workaround (Demonstrating the Mechanism):** By explicitly adding `warmup_max_lr` to the scheduler config, the learning rate behaves as expected. My code change makes this the default behavior. <img width="1195" alt="Screenshot 2025-06-16 at 20 17 11" src="https://github.com/user-attachments/assets/cc170246-fdac-4a56-8b9c-f204ebb47895" /> Signed-off-by: Vensenmu <vensenmu@gmail.com> Co-authored-by: Olatunji Ruwase <tjruwase@gmail.com>
Author
Parents
Loading