transformers
add shared experts for upcoming Granite 4.0 language models
#35894
Merged

add shared experts for upcoming Granite 4.0 language models #35894

mayank31398
mayank31398116 days ago

This PR adds support for shared experts in GraniteMoE model class for upcoming Granite 4.0 language models.
@ArthurZucker

mayank31398
mayank31398114 days ago

@ArthurZucker can you merge this?
all checks have passed

ArthurZucker
ArthurZucker commented on 2025-01-29
ArthurZucker113 days ago

Hey! As always for us this will be a new model, with modular it should be super easy to add however! https://huggingface.co/docs/transformers/en/modular_transformers 🤗

shawntan
shawntan106 days ago

We're adding an additional feature (shared experts) that doesn't break past checkpoints, and is an extension of our own model class. Would every extension entail a new model class?

ArthurZucker
ArthurZucker105 days ago

Yes 🤗 I am sorry but that is the way we have been handling every single model so far!

shawntan shawntan force pushed from a0fd52ac to 9b2652ca 105 days ago
shawntan Modular GraniteMoE with shared Experts.
7e84bb6f
shawntan Modified
49612368
shawntan Import order.
f6fe3d94
shawntan Modified for style
36668f10
shawntan Fix space.
01819372
shawntan Test
df484522
shawntan Remove extra granitemoe file.
d81c969a
shawntan shawntan force pushed from c09ee2f7 to d81c969a 105 days ago
shawntan Merge branch 'main' into shared-experts
2b2da13c
shawntan
shawntan104 days ago

Not sure how to get the tests to pass , some are not due to the changes I've made.

shawntan New converted file and tests
95d35bb2
shawntan Modified __init__ files.
e2ece17e
shawntan Formatting.
662dfc23
shawntan Dummy PT objects
6fe42feb
Ssukriti register granitemoe shared model
a2b37b69
Ssukriti Merge branch 'main' into shared-experts
bca084d0
Ssukriti fix linting of a file
f11b1159
Ssukriti fix import in modeling file
822d0703
Ssukriti update generated modeling file
9026f35b
Ssukriti Merge branch 'main' into shared-experts
0032dc79
mayank31398 mayank31398 marked this pull request as draft 100 days ago
Ssukriti add documentation
03492a48
Ssukriti update docstrings
4f5cab5a
Ssukriti Merge branch 'main' into shared-experts
fd911fdd
Ssukriti update generated modeling file
a50a3207
Ssukriti fix docstrings in config class
101b786e
mayank31398 mayank31398 marked this pull request as ready for review 98 days ago
mayank31398
mayank3139898 days ago

@ArthurZucker the PR is ready, please review.
The failing tests seem unrelated

ArthurZucker ArthurZucker requested a review from ArthurZucker ArthurZucker 98 days ago
ArthurZucker
ArthurZucker approved these changes on 2025-02-13
ArthurZucker98 days ago

Super clean! Super nice!

src/transformers/models/granitemoeshared/modular_granitemoeshared.py
273
274
275class GraniteMoeSharedForCausalLM(GraniteMoeForCausalLM):
276
_tied_weights_keys = ["lm_head.weight"]
ArthurZucker98 days ago

if the mlp is shared, should it appear here?

mayank3139898 days ago👍 1

no, it shouldnt.
shared means its a shared in sense of experts (not across layers)

mayank31398
mayank3139898 days ago

Thanks for approving, please merge as soon as possible :)

Ssukriti Merge branch 'main' into shared-experts
9985ea4f
Ssukriti merge main
997ef0e6
Ssukriti
Ssukriti97 days ago

I have updated with the main branch , made corresponding changes and all checks have passed :)

ArthurZucker ArthurZucker merged a570e2ba into main 97 days ago
mayank31398 mayank31398 deleted the shared-experts branch 97 days ago
EwoutH
EwoutH18 days ago (edited 18 days ago)

Is the public release of Granite-4.0-Tiny-Preview in any way relevant to this PR? (like does it warrant any follow-up work, additional validation/testing/CI, etc.)

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone