transformers
1c52cb7b - mlp_only_layers is more flexible than decoder_sparse_step (#30552)

Commit
1 year ago
mlp_only_layers is more flexible than decoder_sparse_step (#30552) * force back to commit ba40a21 and fix workflow errors * match the review suggestions * fix ci errors * fix CI * fix ci, format code * fix ci, ruff format * fix ci, ruff format again * Update src/transformers/models/qwen2_moe/configuration_qwen2_moe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/qwen2_moe/configuration_qwen2_moe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/qwen2_moe/configuration_qwen2_moe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * solve this warning: Default Argument Value is mutable --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Author
Parents
Loading