llama.cpp
support MiniCPM-V-2.6
#8967
Merged

support MiniCPM-V-2.6 #8967

tc-mb
tc-mb276 days ago👍 36🎉 12❤ 12🚀 6

Dear llama.cpp Official,

Hi, I'm writing to address our new PR submission for integrating our model MiniCPM-V 2.6 into llama.cpp. MiniCPM-V 2.6 is the latest and most capable model in the MiniCPM-V series. This model is stronger and supports multi-images understanding and video understanding.

This version of the model supports video understanding, and I have implemented functions such as video frame extraction in my fork version. However, because ffmpeg is introduced, there may be many environment and compilation issues in other devices. Therefore, I think it can be divided into multiple PR submissions.

  1. This PR will first submit the modification of the model, and I hope it can be merged soon, so that the community can use MiniCPM-V 2.6 by GGUF first.
  2. And in the later PR, support for video formats will be submitted, and we can spend more time discussing how llama.cpp can better integrate the function implementation of video understanding.

Best regards,
MiniCPM-V Official ^_^

tc-mb init
7a49a6f6
tc-mb rename
c536fa6e
tc-mb add run android for termux in readme
2b919034
tc-mb add android readme
0480d5fa
tc-mb add instructions in readme
ec1cea71
tc-mb change name in readme
a491f45c
iceflame89 Update README.md
7573b634
harvestingmoon fixed line
94dcaba6
tc-mb Merge pull request #1 from harvestingmoon/minicpm-v2.5
b31f51f5
tc-mb add result in readme
629420ee
tc-mb random pos_embed
b48708af
tc-mb add positions index
d9fbc1d1
tc-mb change for ollama
18fe6209
tc-mb change for ollama
2997a680
tc-mb better pos_embed in clip
8541e996
tc-mb support ollama
d8974b8e
tc-mb updata cmakelist
e73a0c7c
tc-mb updata cmakelist
6366d62d
tc-mb rename wrapper
056d1781
tc-mb clear code
3c306f18
tc-mb replace and organize code
9495504e
tc-mb add link
b37ab0b1
tc-mb Merge branch 'prepare-PR-of-minicpm-v2.5' into prepare-PR
8767ce29
tc-mb Merge pull request #7 from OpenBMB/prepare-PR
8bd47ce5
tc-mb Merge pull request #8 from OpenBMB/master
28d4a7f9
tc-mb sync master
02eb445d
tc-mb fix warnings
07f48f96
tc-mb fix warnings
c38d152d
tc-mb fix bug in bicubic resize when need resize iamge smaller
88f5e6ab
tc-mb receive review comments and modify
a913ca4c
tc-mb receive review comments and modify
a95a6d99
tc-mb Merge branch 'ggerganov:master' into prepare-PR-of-minicpm-v2.5
c390dd4e
tc-mb put all code into llava dir
efe4c617
tc-mb Merge pull request #11 from OpenBMB/pr_add_all_in_llava
ee5b8509
tc-mb Merge branch 'prepare-PR-of-minicpm-v2.5' into master
77beb4d1
tc-mb Merge pull request #15 from OpenBMB/master
cb8cfb9d
tc-mb fix quality problem in pr code
8f035057
tc-mb change n_layer
e68c8bc1
tc-mb add space in "-1"
4c67d7ce
tc-mb imitate reshape bug of python code
977941d9
tc-mb fix bug in clip
3e6348b8
tc-mb fix issues for merging
c5b68515
tc-mb fix llama-minicpmv-cli in cmake file
5959b14b
tc-mb change pr readme
292a4690
tc-mb fix code review
be8b5b2f
tc-mb remove in line 33 directory in the /cmakelists.txt (not in example, i…
4c755832
tc-mb fix cmakefile
62fa15bc
tc-mb add warn
dad4abe1
tc-mb fix KEY_HAS_MINICPMV_PROJ
3642be99
tc-mb remove load_image_size into clip_ctx
fcde9971
tc-mb remove the extern "C", MINICPMV_API
6fd0937e
tc-mb fix uhd code for review comment
107e1edb
tc-mb delete minicpmv-wrapper in pr
72b96292
tc-mb remove uhd_image_embed
f3d400da
tc-mb Modify 2 notes
65f7455c
tc-mb support minicpmv2.6
6da5130b
tc-mb modify convert script of minicpmv
77c580de
tc-mb modify convert
ea0c8283
tc-mb Merge branch 'prepare-PR-of-minicpm-v2.6' into master
fc1c860b
tc-mb Merge pull request #24 from OpenBMB/master
ce0d1a6f
tc-mb modify convert
6cad864c
tc-mb add readme
fe39ecc1
tc-mb add resampler of v2.6
bffbe1cf
tc-mb modify clip
28d6a0f4
tc-mb modify readme
4a87d1d9
mofosyne mofosyne added Review Complexity : Medium
github-actions github-actions added examples
github-actions github-actions added python
tc-mb fix type-check
32b47f60
x4080
x4080276 days ago🎉 2

waiting for merge

tc-mb fix type-check
662d4c14
tc-mb fix type-check
a945b3ca
tc-mb fix type-check
89d378c7
Vaibhavs10
Vaibhavs10 commented on 2024-08-12
Conversation is marked as resolved
Show resolved
examples/llava/README-minicpmv2.6.md
13
14### Usage of MiniCPM-V 2.6
15
16
Convert PyTorch model to gguf files (You can also download the converted [gguf](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf) by us)
Vaibhavs10274 days ago

Quick comment on the HF Hub repo: https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf

Adding usage instructions on the repo model card would be good so that people who discover the repo can quickly run it via llama.cpp too.

tc-mb274 days ago👍 2

Good idea. I will update it on hf later.

Galunid
Galunid commented on 2024-08-12
Conversation is marked as resolved
Show resolved
examples/llava/minicpmv-convert/minicpmv2_6-surgery.py
Galunid274 days ago

That's the same thing as minicpyv2_5-surgery.py. I think it's much better to generalize it and not duplicate the code here.

tc-mb274 days ago

OK, I will modify.

Galunid
Galunid commented on 2024-08-12
Conversation is marked as resolved
Show resolved
examples/llava/minicpmv-convert/minicpmv2_6-convert-image-encoder-to-gguf.py
407 def get_input_embeddings(self) -> nn.Module:
408 return self.embeddings.patch_embedding
409
410
import argparse
Galunid274 days ago

Pretty much everything below this comment is duplicated in minicpmv2_5-convert-image-encoder-to-gguf.py

tc-mb274 days ago

Most of them are the same, but not exactly, and the model parts and several parameters are different.

Considering that this is just a script used to convert the model, maybe it's okay to simply copy and modify it like this? I think using scripts with different names ensures that users use scripts without convert error.

Or do you think it would be better to combine the conversion scripts into one?

I would be happy to continue discussing this issue with you.

Galunid274 days ago
6d413
< import torch
9d415
< from transformers.models.idefics2.modeling_idefics2 import Idefics2VisionTransformer, Idefics2VisionConfig
147,148c553,555
< vision_config = Idefics2VisionConfig(**default_vision_config)
< model = Idefics2VisionTransformer(vision_config)
---
> 
> vision_config = SiglipVisionConfig(**default_vision_config)
> model = SiglipVisionTransformer(vision_config)
161c568
< minicpmv_version = 2
---
> minicpmv_version = 3
169c576
<     minicpmv_version = 2
---
>     minicpmv_version = 3
280c687
<             re.sub("pos_embed", "pos_embed_k", s): torch.from_numpy(get_2d_sincos_pos_embed(4096, (70, 70))),
---
>             re.sub("pos_embed", "pos_embed_k", s): torch.from_numpy(get_2d_sincos_pos_embed(3584, (70, 70))),
284c691
<             re.sub("proj", "pos_embed_k", s): torch.from_numpy(get_2d_sincos_pos_embed(4096, (70, 70))),
---
>             re.sub("proj", "pos_embed_k", s): torch.from_numpy(get_2d_sincos_pos_embed(3584, (70, 70))),

You could use argparse to add another argument for v2.5/v2.6 model and load configs based on that.

tc-mb274 days ago

ok

tc-mb modify convert script and readme
1ec79f04
tc-mb fix convert script and readme
11233763
tc-mb fix convert
f30c5e11
tc-mb fix num in convert
47eb0a55
yorkane
yorkane274 days ago🎉 10

waiting for merge

Vaibhavs10
Vaibhavs10 commented on 2024-08-12
Conversation is marked as resolved
Show resolved
examples/llava/minicpmv-convert-image-encoder-to-gguf.py
159581has_vision_encoder = True
160582has_minicpmv_projector = False
583
584
Vaibhavs10274 days ago

Trailing space!

tc-mb274 days ago

ok

tc-mb fix type-check
1ca3f06a
Vaibhavs10 Vaibhavs10 requested a review from Galunid Galunid 273 days ago
HaishengLiang
HaishengLiang271 days ago🎉 11

waiting for merge

nanowell
nanowell271 days ago👍 9

waiting for merge

Vaibhavs10 Vaibhavs10 requested a review from ggerganov ggerganov 270 days ago
ggerganov ggerganov merged d565bb2f into master 270 days ago
saket424
saket424269 days ago (edited 269 days ago)

I have opened an issue 9066 where I experienced a crash after this pull request was merged. The crash was unrelated to this miniCPM-V-2.6 model. I hope you can reproduce the error

tc-mb
tc-mb267 days ago👀 1

I have opened an issue 9066 where I experienced a crash after this pull request was merged. The crash was unrelated to this miniCPM-V-2.6 model. I hope you can reproduce the error

Hello, I saw that the issue you mentioned was that llava would crash, but my update only involves the part of minicpmv. Although I am not sure about the issue problem, I feel that it may not be the problem with this branch.
Can you test whether this branch will also crash before being merged? Of course, if it is indeed a problem introduced by this PR, I will be very happy to help modify it.

saket424
saket424267 days ago

@tc-mb
The crash is not directly related to your miniCPM2.6 PR other than there is no crash before your PR and a crash after your PR owing to some uninitialized variables

Here is a PR that appears to fix the issue I reported
#9082

Sorry for the false alarm

tc-mb
tc-mb267 days ago

@tc-mb The crash is not directly related to your miniCPM2.6 PR other than there is no crash before your PR and a crash after your PR owing to some uninitialized variables

Here is a PR that appears to fix the issue I reported #9082

Sorry for the false alarm

I'm glad your problem was solved.

x4080
x4080267 days ago

@tc-mb Can we use mini cpm with context cache ? So that we upload image once and ask for multiple question referring to the same image ?

tc-mb
tc-mb266 days ago

@tc-mb Can we use mini cpm with context cache ? So that we upload image once and ask for multiple question referring to the same image ?

Yes, it's now storing cache.

You can run in interactive mode to ask multiple rounds of questions.

./llama-minicpmv-cli -m ../MiniCPM-V-2_6/model/ggml-model-Q4_K_M.gguf --mmproj ../MiniCPM-V-2_6/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -i

or modify the minicpmv-cli function (which is more like an example) to achieve the functionality you want.

yizhangliu
yizhangliu266 days ago

Eagerly awaiting...

tc-mb tc-mb deleted the prepare-PR-of-minicpm-v2.6 branch 266 days ago
fairydreaming
fairydreaming commented on 2024-08-20
examples/llava/minicpmv-convert-image-encoder-to-gguf.py
165587 fname_middle = "mmproj-"
166588 has_text_encoder = False
167589 has_minicpmv_projector = True
590
minicpmv_version = 3
fairydreaming266 days ago👍 2

Is this line necessary? It overrides minicpmv_version value set in the command line when converting MiniCPM-V2.5 which results in a broken mmproj-model-f16.gguf.

x4080
x4080266 days ago👍 1

@tc-mb Can we use mini cpm with context cache ? So that we upload image once and ask for multiple question referring to the same image ?

Yes, it's now storing cache.

You can run in interactive mode to ask multiple rounds of questions.

./llama-minicpmv-cli -m ../MiniCPM-V-2_6/model/ggml-model-Q4_K_M.gguf --mmproj ../MiniCPM-V-2_6/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -i

or modify the minicpmv-cli function (which is more like an example) to achieve the functionality you want.

cool, thats a great feature, thanks @tc-mb

dewarrn1
dewarrn1263 days ago

Very cool! Are GPU operations supported at this time?

tc-mb
tc-mb263 days ago🎉 2

Very cool! Are GPU operations supported at this time?

I have tested in Ubuntu + Nvidia(4090), it is available and speed looks good. You can use it in the following way.

make LLAMA_CUDA=1
And add appropriate ngl parameters, such as.
./llama-minicpmv-cli -m ../MiniCPM-V-2_6/model/ggml-model-Q4_K_M.gguf --mmproj ../MiniCPM-V-2_6/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?" -ngl 100

dewarrn1
dewarrn1263 days ago

Awesome, thanks!

saket424
saket424261 days ago👍 5

@tc-mb
Can you give us the usage for how to serve up minicpm2.6 using llama-server so we can send it openai compatible chat completion requests with base64 encoded images

tc-mb
tc-mb258 days ago👀 4

@tc-mb Can you give us the usage for how to serve up minicpm2.6 using llama-server so we can send it openai compatible chat completion requests with base64 encoded images

Sorry, I didn't test the server method when I updated it, I will support this capability in the near future.

apepkuss
apepkuss175 days ago

@tc-mb Could you please provide the templating info in README-minicpmv2.6.md? Like the llava-cli templating and llava-1.6 prompting section in README-minicpmv2.6.md. It is necessary for practical usage to know how to organize the user question and the image. And also, whether or not the image should be converted to bytes or base64? Thanks!

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone