Much more efficient and clear weight initialization and tie weights (#42191)
* everything untilo informer
* everything until perceiver
* all of them finally
* style
* replace by transformers init everywhere
* use relative import instead
* deprecated models
* style
* start contexts
* small fixes
* fix modular
* remove class switch
* do not initialize tied weights
* typo
* fix
* improve
* improve comments
* improve
* improve
* fix zamba
* fix import
* add the post_init
* more post_init
* fix
* protect
* more post_init
* fix
* fixes
* fix
* fix
* switch flag name
* more fixes
* fixes
* fixes
* copies
* fix
* finally find the culprit
* style
* last small
* big bird
* better
* update init check
* final touch
* do it everywhere