transformers
1c52cb7b - mlp_only_layers is more flexible than decoder_sparse_step (#30552)

Commit

2 years ago

mlp_only_layers is more flexible than decoder_sparse_step (#30552) * force back to commit ba40a21 and fix workflow errors * match the review suggestions * fix ci errors * fix CI * fix ci, format code * fix ci, ruff format * fix ci, ruff format again * Update src/transformers/models/qwen2_moe/configuration_qwen2_moe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/qwen2_moe/configuration_qwen2_moe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/qwen2_moe/configuration_qwen2_moe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * solve this warning: Default Argument Value is mutable --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

References

#30552 - mlp_only_layers is more flexible than decoder_sparse_step

Author

eigen2017

Parents

73fcfb28

transformers 1c52cb7b - mlp_only_layers is more flexible than decoder_sparse_step (#30552)

transformers
1c52cb7b - mlp_only_layers is more flexible than decoder_sparse_step (#30552)