llama.cpp
f5636f8f - convert : add image break token fallback (#22914)

Commit
2 days ago
convert : add image break token fallback (#22914) * convert : add image break token fallback This commit adds a image_break_token_id fallback for mistral where the config contains a image_break_token_id of -1: ```console "vision_encoder": { "image_token_id": 10, "image_break_token_id": -1, ... ``` But the tokenizer.json has this token: ```console 115 "id": 12, 116 "content": "[IMG_BREAK]", 117 "single_word": false, 118 "lstrip": false, 119 "rstrip": false, 120 "normalized": false, 121 "special": true 122 }, ``` If we look in convert_hf_to_gguf.py we have: ```python elif self.is_mistral_format: # hparams is already vision config here so norm_eps is only defined in global_config. self.hparams["norm_eps"] = self.global_config.get("norm_eps", None) assert self.hparams["norm_eps"] is not None, "norm_eps not found in params.json" if self.use_break_tok: self.img_break_tok_id = self.find_vparam(["image_break_token_id"]) ``` The motivation for this is that currently converting this models results in the following error: ```console load_hparams: model size: 5131.60 MiB load_hparams: metadata size: 0.15 MiB clip_init: failed to load model 'models/mmproj-Mistral-Medium-3.5-128B.gguf': operator(): unable to find tensor v.token_embd.img_break mtmd_init_from_file: error: Failed to load CLIP model from models/mmproj-Mistral-Medium-3.5-128B.gguf Failed to load vision model from models/mmproj-Mistral-Medium-3.5-128B.gguf ``` With this fallback the model loads successfully. Resolves: https://github.com/ggml-org/llama.cpp/issues/22901 * Revert "convert : add image break token fallback" This reverts commit 292e40cfdf9a7553863007c018236f5f554f71d8. * convert : add image break token fallback This commit adds a image_break_token_id fallback for mistral where the config contains a image_break_token_id of -1: ```console "vision_encoder": { "image_token_id": 10, "image_break_token_id": -1, ... ``` But the tokenizer.json has this token: ```console 115 "id": 12, 116 "content": "[IMG_BREAK]", 117 "single_word": false, 118 "lstrip": false, 119 "rstrip": false, 120 "normalized": false, 121 "special": true 122 }, ``` If we look in convert_hf_to_gguf.py we have: ```python elif self.is_mistral_format: # hparams is already vision config here so norm_eps is only defined in global_config. self.hparams["norm_eps"] = self.global_config.get("norm_eps", None) assert self.hparams["norm_eps"] is not None, "norm_eps not found in params.json" if self.use_break_tok: self.img_break_tok_id = self.find_vparam(["image_break_token_id"]) ``` The motivation for this is that currently converting this models results in the following error: ```console load_hparams: model size: 5131.60 MiB load_hparams: metadata size: 0.15 MiB clip_init: failed to load model 'models/mmproj-Mistral-Medium-3.5-128B.gguf': operator(): unable to find tensor v.token_embd.img_break mtmd_init_from_file: error: Failed to load CLIP model from models/mmproj-Mistral-Medium-3.5-128B.gguf Failed to load vision model from models/mmproj-Mistral-Medium-3.5-128B.gguf ``` With this fallback the model loads successfully. Co-authored-by: Pascal <admin@serveurperso.com> Resolves: https://github.com/ggml-org/llama.cpp/issues/22901 * convert : allow zero value for img_break_tok_id
Author
Parents
Loading