🚨 [v5] Refactor RoPE for layer types (#39847)
* update
* batch update model code
* typos
* too many diffs, dump
* dump again
* another dump
* fix copies
* make `rope_scaling_dict` self attr
* fix a few more tests
* another update
* fix a few more tests, hopefully last ones
* fox copies
* fix copies again
* fix newly added models, I hate rebasing on main
* update config files
* modular files
* fix rope utils test
* docstring has to be indented more, why?
* oops forgot to update some modualr files
* copy from doesn't copy decorators?
* fix overriden test as well
* add a new test
* fix failing tests again
* update docstrings
* fix phi3
* fix two models
* fix copies
* forgot to add
* stupid bug from modular conversion
* fix slow tests
* update to call rotary emb once per model forward
* 3K tests failing?!
* update
* update more models
* fix copies
* fix the rest of tests hopefully
* fix after rebase
* fix the rope tests
* fix docs omni
* change a bit
* models with layer types
* why it was deleted?
* fix a few tests
* fix last test!
* delete extra empty lines
* add a test case
* more changes
* fix models
* typing hint for nested rope params
* missed when resolving conflicts
* delete layer types and fix typo
* fix copies
* fix copies
* update docs text
* docs
* huuge update all models
* fix copies
* rename attr to align with new format
* delete redundant rope tests
* trigger ci
* update the case
* this is why i hate rebasing
* maybe fixed?
* oops
* now fix?
* fix last tests and copies
* fix copies?
* fix minimax and gemma3n
* update typo
* deprecation end version
* final fix copies :fingers-crossed:
* oh my, add the docs in toctree
* oke, this is really the last fix
* fix copies and hope that tests won't start failing again
* use rope scaling if saved
* fix slow tests
* fix cwm and unrelated deepseek
* fix last
* update
* hope it works now, it took so long
* lets keep None for now, I will try to remove after checking tests
* some more fixes, i find and replace does not always find all cases
* last fix of tests
* arthur's comment for extra foreward kwargs
* delete unused code
* fix slow qwen tests
* delete layer types from models
* faulty modular conversion
* fix qwen omni
* fix copies and style
* address my comment
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>