Fix(scheduler): WarmupLR inherits optimizer lr when not specified (#7360)
This PR fixes issue #7303.
### 1. Description of the Bug
Currently, when using the `WarmupLR` scheduler, if `warmup_max_lr` is
not explicitly set in the scheduler's parameters, it incorrectly falls
back to its internal default value (`0.001`), ignoring the learning rate
set in the optimizer's parameters. This can lead to unexpected training
behavior and diverges from user expectations.
### 2. Description of the Fix
This fix modifies the `__init__` method of the `WarmupLR` scheduler in
`deepspeed/runtime/lr_schedules.py`.
- The default value for the `warmup_max_lr` argument in the function
signature is changed from `0.001` to `None`.
- Logic is added to check if `warmup_max_lr` is `None` upon
initialization. If it is, the scheduler now correctly inherits the
learning rate from the optimizer's parameter groups.
This change ensures that the optimizer's learning rate is respected as
the default `warmup_max_lr`, aligning the scheduler's behavior with the
user's configuration intent.
### 3. Verification
The fix has been verified using a minimal reproduction script that
clearly demonstrates the behavioral change.
**Before Fix:**
Without `warmup_max_lr` in the scheduler config, the learning rate
incorrectly defaults to `0.001`.
<img width="1711" alt="Screenshot 2025-06-16 at 18 34 31"
src="https://github.com/user-attachments/assets/fe68f39e-2bbc-4f94-b322-546d9ce43bb0"
/>
**After Workaround (Demonstrating the Mechanism):**
By explicitly adding `warmup_max_lr` to the scheduler config, the
learning rate behaves as expected. My code change makes this the default
behavior.
<img width="1195" alt="Screenshot 2025-06-16 at 20 17 11"
src="https://github.com/user-attachments/assets/cc170246-fdac-4a56-8b9c-f204ebb47895"
/>
Signed-off-by: Vensenmu <vensenmu@gmail.com>
Co-authored-by: Olatunji Ruwase <tjruwase@gmail.com>