@ArthurZucker can you merge this?
all checks have passed
Hey! As always for us this will be a new model, with modular
it should be super easy to add however! https://huggingface.co/docs/transformers/en/modular_transformers 🤗
We're adding an additional feature (shared experts) that doesn't break past checkpoints, and is an extension of our own model class. Would every extension entail a new model class?
Yes 🤗 I am sorry but that is the way we have been handling every single model so far!
Not sure how to get the tests to pass , some are not due to the changes I've made.
@ArthurZucker the PR is ready, please review.
The failing tests seem unrelated
Super clean! Super nice!
273 | |||
274 | |||
275 | class GraniteMoeSharedForCausalLM(GraniteMoeForCausalLM): | ||
276 | _tied_weights_keys = ["lm_head.weight"] |
if the mlp is shared, should it appear here?
no, it shouldnt.
shared means its a shared in sense of experts (not across layers)
Thanks for approving, please merge as soon as possible :)
I have updated with the main branch , made corresponding changes and all checks have passed :)
Is the public release of Granite-4.0-Tiny-Preview in any way relevant to this PR? (like does it warrant any follow-up work, additional validation/testing/CI, etc.)
Login to write a write a comment.
This PR adds support for shared experts in GraniteMoE model class for upcoming Granite 4.0 language models.
@ArthurZucker