Make Muon optimizer easier to enable (#7555)
The original Muon optimizer PR
(https://github.com/deepspeedai/DeepSpeed/pull/7509) requires user to
explicitly set `use_muon` flags in `model.parameters()`, as shown in
test
https://github.com/deepspeedai/DeepSpeed/blob/master/tests/unit/ops/muon/test_muon.py#L27
.
This PR integrate setting of `use_muon` into DeepSpeed before engine
initialization. This makes Muon optimizer easier to use. User only needs
to change optimizer in `config.json` from `AdamW` to `Muon`, no need to
change code. It will solve the following issue
https://github.com/deepspeedai/DeepSpeed/issues/7552
---------
Signed-off-by: Ma, Guokai <guokai.ma@intel.com>
Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>