Fix model kwargs (#35875)
* Save state
* Make a failing test
* Better test
* mpt -> done, many more to go
* Rm extranious
* Bamba
* Bert
* big_bird
* biogpt
* bloom
* codegen
* ctrl
* data2vec
* dbrx
* Through up to Dbrx
* electra
* ernie
* falcon
* Fuyu/persimmon
* Include noop kwargs to base models
* Rebase
* Skip musigen
* Refactor/skip mllama
* Revert makefile
* Rm file
* Fix PT failing, need to modify rest of loss funcs to not resize
* Propagate some
* Continue
* More
* More options
* Mostly fixed
* Proved that it's the same
* Bloom is good
* Make ability to override loss func possible
* Fixup
* Clean
* Fix xglm
* Quality tests
* Skip OCR2
* Make specific loss for xglm
* Make order the same/line up 1:1
* xglm
* Skip fx output loss bloom model
* Didn't pass in pad_token_id
* Fix quality