Fixing bug in Voxtral when merging text and audio embeddings (#40671)
* Fixing bug when replacing text-audio token placeholders with audio embeddings
* apply changes
---------
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>