model: Add Mimo v2.5 model support (#22493)
* add mimo-v2.5 support
* mimo-v2.5: fix modify_tensors row split
* mimi-v2.5: forgot `add_attn_value_scale` plumbing
* mimi-v2.5: fix tp dequant to detect tp rows
* mimo-v2.5: fix TP iteration to be descending
* mimo-v2.5: fix comment
* mimo-v2.5: retain fused qkv
* mimo-v2.5: missed the attn_value scale during merge
* mimo-v2.5: fused QKV needs contiguous for scaling attention value
* mimo-v2.5: move `speech_embeddings.` to TextModel filter_tensors
* Update src/llama-hparams.h
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/models/mimo2.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/models/mimo2.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/models/mimo2.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* mimo-v2.5: include MTP weights in gguf
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>