🚨 Generalize `get_decoder()` for multimodal and delete redundant code 🔪 (#42156)
* update some models
* update the rest
* add helper for encoder
* delete encoder code from models
* fix copies
* fix some tests but VLM will fail
* add encider tests simialr to decoder
* no print
* fix overwritten models
* and a million exceptions with old audio models, revert back