Address review feedback for Muon ZeRO Stage 3 support
- Move save_muon_momentum_buffer_in_memory config to DeepSpeedZeroConfig
in config.py instead of reading inline from ds_config dict
- Fix index bug: change muon_momentum_buffer_partitioned_groups_flat from
list to dict keyed by sub-group index to avoid out-of-bounds access
when non-muon groups precede muon groups
- Add valid code path for non-swappable (GPU/CPU) optimizer without
save_muon_momentum_buffer_in_memory, replacing ValueError
- Validate that all Muon parameter groups share the same momentum (beta)
- Parametrize tests for both True and False save_muon_momentum_buffer_in_memory
- Update docs to show config under zero_optimization
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: PKUWZP <zhipeng.rainbowserie@gmail.com>