transformers
[`StableLm`] Add QK normalization and Parallel Residual Support
#29745
Merged

[`StableLm`] Add QK normalization and Parallel Residual Support #29745

jon-tow
jon-tow init: add StableLm 2 support
3e0509bb
jon-tow add integration test for parallel residual and qk layernorm
70662b93
ArthurZucker
ArthurZucker commented on 2024-03-25
ArthurZucker
ArthurZucker commented on 2024-03-27
ArthurZucker ArthurZucker added New model
jon-tow update(modeling): match qk norm naming for consistency with phi/persi…
ea486cf4
jon-tow fix(tests): run fwd/bwd on random init test model to jitter norm weig…
b53747f2
jon-tow `use_parallel_residual`: add copy pointer to `GPTNeoXLayer.forward`
7d912dd0
ArthurZucker
ArthurZucker approved these changes on 2024-03-30
ArthurZucker
HuggingFaceDocBuilderDev
jon-tow refactor: rename head states var in `StableLmLayerNormPerHead`
593986a7
jon-tow tests: update test model and add generate check
72a01f54
jon-tow
ArthurZucker ArthurZucker merged 2f12e408 into main 1 year ago
jon-tow jon-tow deleted the add-stablelm-2-12b branch 1 year ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone