Allow loading Qwen Thinker 'base' models without generative head (#45457)
* Allow loading Thinker 'base' models without generative head
Currently, for qwen2_5_omni and qwen3_omni_moe, you can only load the 'Talker' variant, i.e. with the audio output. Now, you should also be able to load the 'base' models to get the token embeddings, etc.
The glmasr_encoder, audioflamingo3_encoder, voxtral_encoder, etc. work similarly.
* Added thinker architectures to MODEL_FOR_IMAGE_TEXT_TO_TEXT_MAPPING_NAMES instead