Add test coverage for Muon muon_lr/adam_lr overrides (#8047)
## Summary
Add coverage for separate learning rate overrides in the Muon optimizer
path and fix the related Muon blog documentation.
## Background
Muon parameters and non-Muon parameters are automatically split into
separate optimizer groups. The intended behavior is:
- `muon_lr` applies to Muon parameter groups
- `adam_lr` applies to Adam parameter groups
- `lr` remains the fallback for both groups when overrides are not
provided
## Changes
- add a parameterized test covering:
- legacy `lr` fallback behavior
- separate `muon_lr` / `adam_lr` override behavior
- fix the Muon blog table header to label `muon_lr` and `adam_lr`
correctly
## Validation
Ran:
`python -m pytest
DeepSpeed/tests/unit/ops/muon/test_muon_partial_training.py -k
learning_rate_overrides -q -rs`
Result:
- test collected successfully
- skipped locally because this distributed test requires 2 GPUs, while
the local environment has 1 GPU
---------
Signed-off-by: Sowndappan S <147894621+sowndappan5@users.noreply.github.com>