Refactor weight loading (#41580)
* ah actually we don't discard lm head if missing -> needs to be moved to correct device and etc
* fix some tests
* small fixes
* up
* up
* dik why we tie weights twice but,..,,.
* ups
* removeunused
* fix hunyuan
* small fix
* nits
* ish
* up
* rev
* fix more tie weights keys
* small fixes
* nit
* update
* fix and fix
* fix a test
* glubs
* current shitty changes
* ship validated ones
* more
* more update
* more
* more
* more
* mllama
* more up
* fix ernie
* fix xopies
* up more
* more fixes
* up
* up
* fix-copies
* fix more
* more updates
* AI UPDATE
* up
* hoey
* make it fast
* fix
* lol
* fix asjusting
* more fixes
* _dtype nit
* up
* nit
* update
* update
* remove semaphores
* fix import to avoid jit execution
* try to remove custom tiing logic when its stupid
* fix more individual models
* fix whisper as well
* fix?
* fox umt5
* improve tqdm bar
* cleanup a bit
* oupsi
* some updates
* improve
* remove all buffering -> much faster without it
* remove some tie_weights custome funcs when not needed
* more fixes related to strict matching regex
* remove ALL custom tie weights
* small update
* revert change to init scheme (no need for params)
* mixtral init
* try less strict source check
* tied weight first shot to the fiiiixxxxxx
* does this help?
* :)
* fix some ppolry defined tied_weights_keys for now
* subclass nn.Parameters
* up
* lol
* Ouiiii
* fix led
* fix long cat flash
* fix qwen and long cat flash
* properly fix qwen init
* just push this for now
* propnet is dumb
* update
* push
* remove explict sharing of some tied keys.
* update decoder.bias
* moe case
* more changes to untangle old hardcoded ting
* fixup
* fix big faileurs
* fix prophnet
* fix resize token embeddings
* nits
* fix xcodex
* asyncio?
* fix smart apply
* fix data-2-vec
* [build-ci-image]
* checkout
* uupdate
* fix hunyuan
* update error message
* fix deformable detr
* fixes
* fix init weights for non param gate up projs
* shared todo?
* update some models
* big revert, don't break this behaviour
* ty @SunMarc this fixes the buffers
Co-authored-by: SunMarc <SunMarc@users.noreply.github.com>
* mt5 fuck
* fix lxmbert
* nuke slow test fetcher
* fix zamba and deepcopy for now
* fix zamba tied weight keys! ~
* fix-copies
* update fetch terst
* fix gradient for test modeling common!
* break "shared" for now I will fix tomorrow changes are properly isoalted now :)
* does this fix marian? probably not
* fix some vlms
* D fine seems to handle this well
* glob is fine actually
* fix dab detr
* small steps
* opusy
* fix some more models?
* yups
* better erro
* fix?
* fix double escape
* escape wehere it makes sense
* ??
* fix ibert
* fix tvp as well
* more fxes
* try always download ref PR
* ONONONO
* big fixup
* more fixup
* small step
* small nits
* nits
* brut force some stuff
* fix vilt
* make sure special models that always need tie always tie
* cleaning up
* small nits
* fix zamba and bridge tower!
* just fixup
* potential culprits
* revert bark and fix bridgetower
* remove now non existant tie_weights
* ?
* lol reformer actually had nothing tied!
* wow these two fucking models were really not well made
* fix sam family!
* fix bark revision
* fix speech2test ?
* push this for now....
* upsy
* the fuck
* fix rtdetr
* update
* proper
* wow that one 's annoying
* update
* try to find the culprit
* get some help on common
* nit about general init and cls.padding_idx
* revert num workers update
* remove old loading func
* fix glob
* add annotations
* fix re
* small improvements
* clean some stuff
* improvements
* someone did not understannnnnnd what I tried to dooo or does BNB not support that either?
* gluos
* fix case when `.` is just not there
* remove unused arg
* recover orignal parameter/buffer using _original
* fix glob issu
* this?
* deepspeed best-effort
* remove unused stuff
* Update tie weight keys as they were just wroong
Co-authored-by: Benjamin Bossan <benjaminbossan@users.noreply.github.com>"
* up
* augustuc clauss, a gloubs gloups gloubs
* fixup
* fixup
* there was fucking typo
* mrain
* nits
* fix marian 3 remaining tests
* one more
* fix some of the copies, not all :)
* small cleanup
* one propertest
* fix core model loadig tes
* attempt a new test
* fix some of the annoying tests by supporting reading .bin sometimes
* push
* push more small fixes
* remove 1 useless test
* up
* fix audio flamingo post rebase
* fixup
* some small updatess
* fix sam models
* nits
* up
* updates
* onem ore
* skip this stupid test
* some other fixes
* fixup
* update
* skip more offloaded stuff
* oups
* ups
* update mixtral
* skip this one
* LET"SGO
* fixup
* rope delta order
* fix csm
* small nit
---------
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: SunMarc <SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <marc@huggingface.co>