mlp_only_layers is more flexible than decoder_sparse_step (#30552)
* force back to commit ba40a21 and fix workflow errors
* match the review suggestions
* fix ci errors
* fix CI
* fix ci, format code
* fix ci, ruff format
* fix ci, ruff format again
* Update src/transformers/models/qwen2_moe/configuration_qwen2_moe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/qwen2_moe/configuration_qwen2_moe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/qwen2_moe/configuration_qwen2_moe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* solve this warning: Default Argument Value is mutable
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>