PR #7596 feat: add changes to handle jina v2 base code

JoanFM1 year ago (edited 1 year ago)👍 2

PR to allow using jinaai/jina-embeddings-v2-base-code with llama.cpp. It has an extra normalization layer compared to other models of the JinaV2 family and this is why it is considered independently.

feat: add changes to handle jina v2 base code

cc0ac097

fix: do not complicate things

21936ddb

JoanFM force pushed from dd42a71d to 21936ddb 1 year ago

github-actions added python

github-actions1 year ago (edited 1 year ago)

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 527 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8891.02ms p(95)=21954.48ms fails=, finish reason: stop=474 truncated=53
Prompt processing (pp): avg=104.81tk/s p(95)=444.45tk/s
Token generation (tg): avg=45.37tk/s p(95)=46.03tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=feat-jina-v2-base-code commit=4c4d877d23dd27fc7e323b4a2623db825e8bd29f

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 527 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1717659739 --> 1717660371
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 493.19, 493.19, 493.19, 493.19, 493.19, 857.68, 857.68, 857.68, 857.68, 857.68, 709.43, 709.43, 709.43, 709.43, 709.43, 733.5, 733.5, 733.5, 733.5, 733.5, 767.28, 767.28, 767.28, 767.28, 767.28, 784.63, 784.63, 784.63, 784.63, 784.63, 791.91, 791.91, 791.91, 791.91, 791.91, 802.36, 802.36, 802.36, 802.36, 802.36, 800.36, 800.36, 800.36, 800.36, 800.36, 819.58, 819.58, 819.58, 819.58, 819.58, 843.83, 843.83, 843.83, 843.83, 843.83, 841.72, 841.72, 841.72, 841.72, 841.72, 845.53, 845.53, 845.53, 845.53, 845.53, 851.97, 851.97, 851.97, 851.97, 851.97, 844.57, 844.57, 844.57, 844.57, 844.57, 842.59, 842.59, 842.59, 842.59, 842.59, 849.22, 849.22, 849.22, 849.22, 849.22, 846.64, 846.64, 846.64, 846.64, 846.64, 845.18, 845.18, 845.18, 845.18, 845.18, 840.62, 840.62, 840.62, 840.62, 840.62, 837.89, 837.89, 837.89, 837.89, 837.89, 845.37, 845.37, 845.37, 845.37, 845.37, 848.1, 848.1, 848.1, 848.1, 848.1, 859.19, 859.19, 859.19, 859.19, 859.19, 825.7, 825.7, 825.7, 825.7, 825.7, 828.95, 828.95, 828.95, 828.95, 828.95, 830.58, 830.58, 830.58, 830.58, 830.58, 843.53, 843.53, 843.53, 843.53, 843.53, 842.45, 842.45, 842.45, 842.45, 842.45, 842.1, 842.1, 842.1, 842.1, 842.1, 846.88, 846.88, 846.88, 846.88, 846.88, 848.49, 848.49, 848.49, 848.49, 848.49, 845.74, 845.74, 845.74, 845.74, 845.74, 849.84, 849.84, 849.84, 849.84, 849.84, 861.69, 861.69, 861.69, 861.69, 861.69, 864.26, 864.26, 864.26, 864.26, 864.26, 859.05, 859.05, 859.05, 859.05, 859.05, 858.1, 858.1, 858.1, 858.1, 858.1, 855.1, 855.1, 855.1, 855.1, 855.1, 855.09, 855.09, 855.09, 855.09, 855.09, 859.42, 859.42, 859.42, 859.42, 859.42, 860.42, 860.42, 860.42, 860.42, 860.42, 864.41, 864.41, 864.41, 864.41, 864.41, 856.43, 856.43, 856.43, 856.43, 856.43, 845.97, 845.97, 845.97, 845.97, 845.97, 844.92, 844.92, 844.92, 844.92, 844.92, 843.12, 843.12, 843.12, 843.12, 843.12, 845.46, 845.46, 845.46, 845.46, 845.46, 846.38, 846.38, 846.38, 846.38, 846.38, 845.88, 845.88, 845.88, 845.88, 845.88, 847.71, 847.71, 847.71, 847.71, 847.71, 850.33, 850.33, 850.33, 850.33, 850.33, 853.11, 853.11, 853.11, 853.11, 853.11, 856.25, 856.25, 856.25, 856.25, 856.25, 854.69, 854.69, 854.69, 854.69, 854.69, 859.26, 859.26, 859.26, 859.26, 859.26, 860.76, 860.76, 860.76, 860.76, 860.76, 862.36, 862.36, 862.36, 862.36, 862.36, 861.9, 861.9, 861.9, 861.9, 861.9, 862.47, 862.47, 862.47, 862.47, 862.47, 862.06, 862.06]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 527 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1717659739 --> 1717660371
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 36.24, 36.24, 36.24, 36.24, 36.24, 38.3, 38.3, 38.3, 38.3, 38.3, 28.79, 28.79, 28.79, 28.79, 28.79, 29.62, 29.62, 29.62, 29.62, 29.62, 30.21, 30.21, 30.21, 30.21, 30.21, 30.95, 30.95, 30.95, 30.95, 30.95, 32.21, 32.21, 32.21, 32.21, 32.21, 32.32, 32.32, 32.32, 32.32, 32.32, 33.43, 33.43, 33.43, 33.43, 33.43, 33.73, 33.73, 33.73, 33.73, 33.73, 33.57, 33.57, 33.57, 33.57, 33.57, 33.77, 33.77, 33.77, 33.77, 33.77, 33.22, 33.22, 33.22, 33.22, 33.22, 32.28, 32.28, 32.28, 32.28, 32.28, 31.88, 31.88, 31.88, 31.88, 31.88, 31.02, 31.02, 31.02, 31.02, 31.02, 29.89, 29.89, 29.89, 29.89, 29.89, 30.23, 30.23, 30.23, 30.23, 30.23, 30.23, 30.23, 30.23, 30.23, 30.23, 29.83, 29.83, 29.83, 29.83, 29.83, 29.3, 29.3, 29.3, 29.3, 29.3, 29.3, 29.3, 29.3, 29.3, 29.3, 29.46, 29.46, 29.46, 29.46, 29.46, 29.66, 29.66, 29.66, 29.66, 29.66, 29.48, 29.48, 29.48, 29.48, 29.48, 29.69, 29.69, 29.69, 29.69, 29.69, 29.89, 29.89, 29.89, 29.89, 29.89, 30.0, 30.0, 30.0, 30.0, 30.0, 29.76, 29.76, 29.76, 29.76, 29.76, 29.92, 29.92, 29.92, 29.92, 29.92, 30.18, 30.18, 30.18, 30.18, 30.18, 30.23, 30.23, 30.23, 30.23, 30.23, 30.39, 30.39, 30.39, 30.39, 30.39, 30.52, 30.52, 30.52, 30.52, 30.52, 30.61, 30.61, 30.61, 30.61, 30.61, 30.47, 30.47, 30.47, 30.47, 30.47, 30.39, 30.39, 30.39, 30.39, 30.39, 30.03, 30.03, 30.03, 30.03, 30.03, 29.67, 29.67, 29.67, 29.67, 29.67, 29.78, 29.78, 29.78, 29.78, 29.78, 29.8, 29.8, 29.8, 29.8, 29.8, 29.96, 29.96, 29.96, 29.96, 29.96, 30.07, 30.07, 30.07, 30.07, 30.07, 29.92, 29.92, 29.92, 29.92, 29.92, 29.68, 29.68, 29.68, 29.68, 29.68, 29.53, 29.53, 29.53, 29.53, 29.53, 28.64, 28.64, 28.64, 28.64, 28.64, 28.28, 28.28, 28.28, 28.28, 28.28, 28.23, 28.23, 28.23, 28.23, 28.23, 28.23, 28.23, 28.23, 28.23, 28.23, 28.28, 28.28, 28.28, 28.28, 28.28, 28.33, 28.33, 28.33, 28.33, 28.33, 28.41, 28.41, 28.41, 28.41, 28.41, 28.45, 28.45, 28.45, 28.45, 28.45, 28.33, 28.33, 28.33, 28.33, 28.33, 28.32, 28.32, 28.32, 28.32, 28.32, 28.29, 28.29, 28.29, 28.29, 28.29, 28.32, 28.32, 28.32, 28.32, 28.32, 28.55, 28.55, 28.55, 28.55, 28.55, 28.67, 28.67, 28.67, 28.67, 28.67, 28.75, 28.75]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 527 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1717659739 --> 1717660371
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.13, 0.13, 0.13, 0.13, 0.13, 0.43, 0.43, 0.43, 0.43, 0.43, 0.19, 0.19, 0.19, 0.19, 0.19, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.3, 0.3, 0.3, 0.3, 0.3, 0.29, 0.29, 0.29, 0.29, 0.29, 0.43, 0.43, 0.43, 0.43, 0.43, 0.43, 0.43, 0.43, 0.43, 0.43, 0.25, 0.25, 0.25, 0.25, 0.25, 0.15, 0.15, 0.15, 0.15, 0.15, 0.12, 0.12, 0.12, 0.12, 0.12, 0.33, 0.33, 0.33, 0.33, 0.33, 0.35, 0.35, 0.35, 0.35, 0.35, 0.1, 0.1, 0.1, 0.1, 0.1, 0.12, 0.12, 0.12, 0.12, 0.12, 0.19, 0.19, 0.19, 0.19, 0.19, 0.2, 0.2, 0.2, 0.2, 0.2, 0.1, 0.1, 0.1, 0.1, 0.1, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16, 0.16, 0.16, 0.16, 0.31, 0.31, 0.31, 0.31, 0.31, 0.22, 0.22, 0.22, 0.22, 0.22, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.18, 0.18, 0.18, 0.18, 0.18, 0.11, 0.11, 0.11, 0.11, 0.11, 0.18, 0.18, 0.18, 0.18, 0.18, 0.31, 0.31, 0.31, 0.31, 0.31, 0.32, 0.32, 0.32, 0.32, 0.32, 0.31, 0.31, 0.31, 0.31, 0.31, 0.38, 0.38, 0.38, 0.38, 0.38, 0.13, 0.13, 0.13, 0.13, 0.13, 0.1, 0.1, 0.1, 0.1, 0.1, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.27, 0.27, 0.27, 0.27, 0.27, 0.57, 0.57, 0.57, 0.57, 0.57, 0.58, 0.58, 0.58, 0.58, 0.58, 0.66, 0.66, 0.66, 0.66, 0.66, 0.38, 0.38, 0.38, 0.38, 0.38, 0.29, 0.29, 0.29, 0.29, 0.29, 0.22, 0.22, 0.22, 0.22, 0.22, 0.24, 0.24, 0.24, 0.24, 0.24, 0.13, 0.13, 0.13, 0.13, 0.13, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.33, 0.33, 0.33, 0.33, 0.33, 0.24, 0.24, 0.24, 0.24, 0.24, 0.2, 0.2, 0.2, 0.2, 0.2, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.14, 0.14, 0.14, 0.14, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.21, 0.21]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 527 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1717659739 --> 1717660371
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0]

mofosyne added Review Complexity : Medium

fix: fix the usage of the code model

9a65c7a2

JoanFM force pushed from 17a5e9f4 to 9a65c7a2 1 year ago

Merge branch 'master' of https://github.com/JoanFM/llama.cpp into fea…

96a6f552

JoanFM force pushed from 4117c40b to 96a6f552 1 year ago

teleprint-me1 year ago (edited 1 year ago)

It's how the tokens are handled in llama.cpp. I'm in the middle of figuring out how tokenizers operates under the hood and seeing if there's a way to create a bridge between the two. Actually, your input would be invaluable (#7379). Or if you know someone that's better suited and has a deeper understanding of tokenizers (e.g. BPE/WPM) in general. I'm interested in Jina because the english version uses WPM. The spanish and dutch versions use BPE. I'm more focused on Llama-2 and Llama-3 for BPE.

Aside: I have no idea how the CD/CI is setup here. I have some experience with Jenkins, but all of this is outside of the scope of what I'm focused on. Also, I'm just a contributor. I just chime in when I think I might have something of value to add.

JoanFM1 year ago

It's how the tokens are handled in llama.cpp. I'm in the middle of figuring out how tokenizers operates under the hood and seeing if there's a way to create a bridge between the two. Actually, your input would be invaluable (#7379). Or if you know someone that's better suited and has a deeper understanding of tokenizers (e.g. BPE/WPM) in general. I'm interested in Jina because the english version uses WPM. The spanish and dutch versions use BPE. I'm more focused on Llama-2 and Llama-3 for BPE.

Aside: I have no idea how the CD/CI is setup here. I have some experience with Jenkins, but all of this is outside of the scope of what I'm focused on. Also, I'm just a contributor. I just chime in when I think I might have something of value to add.

Hey @teleprint-me ,

To be honest, I found it quite hard to work with tokenizer logic here, but I do not quite understand what you aim to achieve in #7379. If you want we can jump in a call to discuss and make this process more agile.

ggerganov1 year ago

could you also guide me on how to fix the CI problems?

Rebase on latest master and the CI should work

JoanFM1 year ago

could you also guide me on how to fix the CI problems?

Rebase on latest master and the CI should work

I will, thanks

ggerganov1 year ago

So ):\ tit should not be matched. Is there any logic in the code that eliminates these patterns \ from the vocab?

Hm, not sure why this happens. We don't escape strings in the vocab - only in the prompt input:

https://github.com/ggerganov/llama.cpp/blob/3b38d48609280aa5f8ab7ea135a4351b2a5ee240/common/common.cpp#L249-L257

JoanFM1 year ago

So ):\ tit should not be matched. Is there any logic in the code that eliminates these patterns \ from the vocab?

Hm, not sure why this happens. We don't escape strings in the vocab - only in the prompt input:

https://github.com/ggerganov/llama.cpp/blob/3b38d48609280aa5f8ab7ea135a4351b2a5ee240/common/common.cpp#L249-L257

I will try to investigate this

JoanFM1 year ago

@ggerganov,

I am also trying to see if I can add support for chinese model and I manage to get it to work for English, but not for the Chinese characters. Is there a supported model in Chinese? So I can see if I can inspire on which tokenizers they use, etc ...?

ggerganov1 year ago

I believe the most recent model that we added and also supports Chinese is https://huggingface.co/deepseek-ai/DeepSeek-V2. See if @fairydreaming's PR could be of any help: #7519

JoanFM1 year ago (edited 1 year ago)👀 1

Hey @ggerganov ,

I am starting to think that it is not a problem of the tokenizer.

Here is my observation.

I am tryng to run this code to check how the embedding behaves:

gdb --args ../build/bin/embedding -m ./jina-embeddings-v2-base-code.gguf --threads 1 --verbose-prompt -p "for idx, x in enumerate(xs):\n    print(idx, x)"

and this is what gdb is telling me:

(gdb) run
Starting program: /home/joan/workspace/ollama/llm/llama.cpp/build/bin/embedding -m ./jina-embeddings-v2-base-code.gguf --threads 1 --verbose-prompt -p for\ idx,\ x\ in\ enumerate\(xs\):\\n\ \ \ \ print\(idx,\ x\)

Look at all the \ that have been added. This seems to be the reason why I get different tokenization, in Python If I add an extra \ before \\n I get the same encoding.

I am not sure if it is a problem of how the standard input is encoded or something? Do you happen to have any clue about this?

if I hardcode this sentence and avoid the split lines.

params.prompt = "for idx, x in enumerate(xs):\n    print(idx, x)";

I get the same behavior as in Python

ggerganov1 year ago👀 1

I see, does adding -e to the command-line argument fix the issue?

 ../build/bin/embedding -m ./jina-embeddings-v2-base-code.gguf --threads 1 --verbose-prompt -e -p "for idx, x in enumerate(xs):\n    print(idx, x)"

JoanFM1 year ago🎉 1

I see, does adding -e to the command-line argument fix the issue?

 ../build/bin/embedding -m ./jina-embeddings-v2-base-code.gguf --threads 1 --verbose-prompt -e -p "for idx, x in enumerate(xs):\n    print(idx, x)"

Oh, it does!

Merge branch 'master' of https://github.com/JoanFM/llama.cpp into fea…

0fc775ed

JoanFM marked this pull request as ready for review 1 year ago

JoanFM commented on 2024-06-04

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

fix: fix comments

4bce30cc

JoanFM1 year ago

I see, does adding -e to the command-line argument fix the issue?

 ../build/bin/embedding -m ./jina-embeddings-v2-base-code.gguf --threads 1 --verbose-prompt -e -p "for idx, x in enumerate(xs):\n    print(idx, x)"

@ggerganov ,

how then can we be sure this behavior is available in the server? I see this escape option only available in the example itself.

ggerganov1 year ago👍 1

I believe server already escapes these through the JSON parsing library. Btw all examples now escape by default since #7675, so no need to even add -e explicitly

fix: fix linting issues

3b44f8f6

JoanFM force pushed from 0481e5fe to 3b44f8f6 1 year ago

JoanFM commented on 2024-06-05

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

fix: remove ollama patches

05659d3c

JoanFM force pushed from 404daca1 to 05659d3c 1 year ago

JoanFM1 year ago

Hey @ggerganov,

Is there something from my code that may have caused this CI to fail?

ggerganov1 year ago

Probably just a fluke, will restart the workflows now

Merge branch 'master' of https://github.com/JoanFM/llama.cpp into fea…

7ab6023b

JoanFM1 year ago

@ggerganov I tested the behavior in server and works, I consider this is ready to be reviewed.

style : minor

4c4d877d

ggerganov approved these changes on 2024-06-06

ggerganov merged f5d7b268 into master 1 year ago

	2451	if 'gated_layer' in name:
2455	2452	d1 = data[:self.intermediate_size, :]
2456	2453	name1 = name.replace('gated_layers', 'gated_layers_w')
	2454	name1 = name.replace('up_gated_layer', 'gated_layers_v')

459	456	if chkhsh == "b6dc8df998e1cfbdc4eac8243701a65afe638679230920b50d6f17d81c098166":
460	457	# ref: https://huggingface.co/allenai/OLMo-1.7-7B-hf
461	458	res = "olmo"
462		if chkhsh == "a8594e3edff7c29c003940395316294b2c623e09894deebbc65f33f1515df79e":

420	420	# NOTE: if you get an error here, you need to update the convert-hf-to-gguf-update.py script
421	421	# or pull the latest version of the model from Huggingface
422	422	# don't edit the hashes manually!
423		if chkhsh == "0ef9807a4087ebef797fc749390439009c3b9eda9ad1a097abbe738f486c01e5":

	2454	name1 = name.replace('up_gated_layer', 'gated_layers_v')
2457	2455	d2 = data[self.intermediate_size:, :]
2458	2456	name2 = name.replace('gated_layers', 'gated_layers_v')
	2457	name2 = name.replace('up_gated_layer', 'gated_layers_w')

llama.cpp
feat: add changes to handle jina v2 base code
#7596

Merged

feat: add changes to handle jina v2 base code #7596

llama.cpp feat: add changes to handle jina v2 base code #7596 Merged

feat: add changes to handle jina v2 base code #7596

llama.cpp
feat: add changes to handle jina v2 base code
#7596

Merged