llama.cpp
feat: add changes to handle jina v2 base code
#7596
Merged

feat: add changes to handle jina v2 base code #7596

JoanFM
JoanFM1 year ago (edited 1 year ago)👍 2

PR to allow using jinaai/jina-embeddings-v2-base-code with llama.cpp. It has an extra normalization layer compared to other models of the JinaV2 family and this is why it is considered independently.

feat: add changes to handle jina v2 base code
cc0ac097
fix: do not complicate things
21936ddb
JoanFM JoanFM force pushed from dd42a71d to 21936ddb 1 year ago
github-actions github-actions added python
github-actions
github-actions1 year ago (edited 1 year ago)

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 527 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8891.02ms p(95)=21954.48ms fails=, finish reason: stop=474 truncated=53
  • Prompt processing (pp): avg=104.81tk/s p(95)=444.45tk/s
  • Token generation (tg): avg=45.37tk/s p(95)=46.03tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=feat-jina-v2-base-code commit=4c4d877d23dd27fc7e323b4a2623db825e8bd29f

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 527 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1717659739 --> 1717660371
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 493.19, 493.19, 493.19, 493.19, 493.19, 857.68, 857.68, 857.68, 857.68, 857.68, 709.43, 709.43, 709.43, 709.43, 709.43, 733.5, 733.5, 733.5, 733.5, 733.5, 767.28, 767.28, 767.28, 767.28, 767.28, 784.63, 784.63, 784.63, 784.63, 784.63, 791.91, 791.91, 791.91, 791.91, 791.91, 802.36, 802.36, 802.36, 802.36, 802.36, 800.36, 800.36, 800.36, 800.36, 800.36, 819.58, 819.58, 819.58, 819.58, 819.58, 843.83, 843.83, 843.83, 843.83, 843.83, 841.72, 841.72, 841.72, 841.72, 841.72, 845.53, 845.53, 845.53, 845.53, 845.53, 851.97, 851.97, 851.97, 851.97, 851.97, 844.57, 844.57, 844.57, 844.57, 844.57, 842.59, 842.59, 842.59, 842.59, 842.59, 849.22, 849.22, 849.22, 849.22, 849.22, 846.64, 846.64, 846.64, 846.64, 846.64, 845.18, 845.18, 845.18, 845.18, 845.18, 840.62, 840.62, 840.62, 840.62, 840.62, 837.89, 837.89, 837.89, 837.89, 837.89, 845.37, 845.37, 845.37, 845.37, 845.37, 848.1, 848.1, 848.1, 848.1, 848.1, 859.19, 859.19, 859.19, 859.19, 859.19, 825.7, 825.7, 825.7, 825.7, 825.7, 828.95, 828.95, 828.95, 828.95, 828.95, 830.58, 830.58, 830.58, 830.58, 830.58, 843.53, 843.53, 843.53, 843.53, 843.53, 842.45, 842.45, 842.45, 842.45, 842.45, 842.1, 842.1, 842.1, 842.1, 842.1, 846.88, 846.88, 846.88, 846.88, 846.88, 848.49, 848.49, 848.49, 848.49, 848.49, 845.74, 845.74, 845.74, 845.74, 845.74, 849.84, 849.84, 849.84, 849.84, 849.84, 861.69, 861.69, 861.69, 861.69, 861.69, 864.26, 864.26, 864.26, 864.26, 864.26, 859.05, 859.05, 859.05, 859.05, 859.05, 858.1, 858.1, 858.1, 858.1, 858.1, 855.1, 855.1, 855.1, 855.1, 855.1, 855.09, 855.09, 855.09, 855.09, 855.09, 859.42, 859.42, 859.42, 859.42, 859.42, 860.42, 860.42, 860.42, 860.42, 860.42, 864.41, 864.41, 864.41, 864.41, 864.41, 856.43, 856.43, 856.43, 856.43, 856.43, 845.97, 845.97, 845.97, 845.97, 845.97, 844.92, 844.92, 844.92, 844.92, 844.92, 843.12, 843.12, 843.12, 843.12, 843.12, 845.46, 845.46, 845.46, 845.46, 845.46, 846.38, 846.38, 846.38, 846.38, 846.38, 845.88, 845.88, 845.88, 845.88, 845.88, 847.71, 847.71, 847.71, 847.71, 847.71, 850.33, 850.33, 850.33, 850.33, 850.33, 853.11, 853.11, 853.11, 853.11, 853.11, 856.25, 856.25, 856.25, 856.25, 856.25, 854.69, 854.69, 854.69, 854.69, 854.69, 859.26, 859.26, 859.26, 859.26, 859.26, 860.76, 860.76, 860.76, 860.76, 860.76, 862.36, 862.36, 862.36, 862.36, 862.36, 861.9, 861.9, 861.9, 861.9, 861.9, 862.47, 862.47, 862.47, 862.47, 862.47, 862.06, 862.06]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 527 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1717659739 --> 1717660371
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 36.24, 36.24, 36.24, 36.24, 36.24, 38.3, 38.3, 38.3, 38.3, 38.3, 28.79, 28.79, 28.79, 28.79, 28.79, 29.62, 29.62, 29.62, 29.62, 29.62, 30.21, 30.21, 30.21, 30.21, 30.21, 30.95, 30.95, 30.95, 30.95, 30.95, 32.21, 32.21, 32.21, 32.21, 32.21, 32.32, 32.32, 32.32, 32.32, 32.32, 33.43, 33.43, 33.43, 33.43, 33.43, 33.73, 33.73, 33.73, 33.73, 33.73, 33.57, 33.57, 33.57, 33.57, 33.57, 33.77, 33.77, 33.77, 33.77, 33.77, 33.22, 33.22, 33.22, 33.22, 33.22, 32.28, 32.28, 32.28, 32.28, 32.28, 31.88, 31.88, 31.88, 31.88, 31.88, 31.02, 31.02, 31.02, 31.02, 31.02, 29.89, 29.89, 29.89, 29.89, 29.89, 30.23, 30.23, 30.23, 30.23, 30.23, 30.23, 30.23, 30.23, 30.23, 30.23, 29.83, 29.83, 29.83, 29.83, 29.83, 29.3, 29.3, 29.3, 29.3, 29.3, 29.3, 29.3, 29.3, 29.3, 29.3, 29.46, 29.46, 29.46, 29.46, 29.46, 29.66, 29.66, 29.66, 29.66, 29.66, 29.48, 29.48, 29.48, 29.48, 29.48, 29.69, 29.69, 29.69, 29.69, 29.69, 29.89, 29.89, 29.89, 29.89, 29.89, 30.0, 30.0, 30.0, 30.0, 30.0, 29.76, 29.76, 29.76, 29.76, 29.76, 29.92, 29.92, 29.92, 29.92, 29.92, 30.18, 30.18, 30.18, 30.18, 30.18, 30.23, 30.23, 30.23, 30.23, 30.23, 30.39, 30.39, 30.39, 30.39, 30.39, 30.52, 30.52, 30.52, 30.52, 30.52, 30.61, 30.61, 30.61, 30.61, 30.61, 30.47, 30.47, 30.47, 30.47, 30.47, 30.39, 30.39, 30.39, 30.39, 30.39, 30.03, 30.03, 30.03, 30.03, 30.03, 29.67, 29.67, 29.67, 29.67, 29.67, 29.78, 29.78, 29.78, 29.78, 29.78, 29.8, 29.8, 29.8, 29.8, 29.8, 29.96, 29.96, 29.96, 29.96, 29.96, 30.07, 30.07, 30.07, 30.07, 30.07, 29.92, 29.92, 29.92, 29.92, 29.92, 29.68, 29.68, 29.68, 29.68, 29.68, 29.53, 29.53, 29.53, 29.53, 29.53, 28.64, 28.64, 28.64, 28.64, 28.64, 28.28, 28.28, 28.28, 28.28, 28.28, 28.23, 28.23, 28.23, 28.23, 28.23, 28.23, 28.23, 28.23, 28.23, 28.23, 28.28, 28.28, 28.28, 28.28, 28.28, 28.33, 28.33, 28.33, 28.33, 28.33, 28.41, 28.41, 28.41, 28.41, 28.41, 28.45, 28.45, 28.45, 28.45, 28.45, 28.33, 28.33, 28.33, 28.33, 28.33, 28.32, 28.32, 28.32, 28.32, 28.32, 28.29, 28.29, 28.29, 28.29, 28.29, 28.32, 28.32, 28.32, 28.32, 28.32, 28.55, 28.55, 28.55, 28.55, 28.55, 28.67, 28.67, 28.67, 28.67, 28.67, 28.75, 28.75]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 527 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1717659739 --> 1717660371
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.13, 0.13, 0.13, 0.13, 0.13, 0.43, 0.43, 0.43, 0.43, 0.43, 0.19, 0.19, 0.19, 0.19, 0.19, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.3, 0.3, 0.3, 0.3, 0.3, 0.29, 0.29, 0.29, 0.29, 0.29, 0.43, 0.43, 0.43, 0.43, 0.43, 0.43, 0.43, 0.43, 0.43, 0.43, 0.25, 0.25, 0.25, 0.25, 0.25, 0.15, 0.15, 0.15, 0.15, 0.15, 0.12, 0.12, 0.12, 0.12, 0.12, 0.33, 0.33, 0.33, 0.33, 0.33, 0.35, 0.35, 0.35, 0.35, 0.35, 0.1, 0.1, 0.1, 0.1, 0.1, 0.12, 0.12, 0.12, 0.12, 0.12, 0.19, 0.19, 0.19, 0.19, 0.19, 0.2, 0.2, 0.2, 0.2, 0.2, 0.1, 0.1, 0.1, 0.1, 0.1, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16, 0.16, 0.16, 0.16, 0.31, 0.31, 0.31, 0.31, 0.31, 0.22, 0.22, 0.22, 0.22, 0.22, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.18, 0.18, 0.18, 0.18, 0.18, 0.11, 0.11, 0.11, 0.11, 0.11, 0.18, 0.18, 0.18, 0.18, 0.18, 0.31, 0.31, 0.31, 0.31, 0.31, 0.32, 0.32, 0.32, 0.32, 0.32, 0.31, 0.31, 0.31, 0.31, 0.31, 0.38, 0.38, 0.38, 0.38, 0.38, 0.13, 0.13, 0.13, 0.13, 0.13, 0.1, 0.1, 0.1, 0.1, 0.1, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.27, 0.27, 0.27, 0.27, 0.27, 0.57, 0.57, 0.57, 0.57, 0.57, 0.58, 0.58, 0.58, 0.58, 0.58, 0.66, 0.66, 0.66, 0.66, 0.66, 0.38, 0.38, 0.38, 0.38, 0.38, 0.29, 0.29, 0.29, 0.29, 0.29, 0.22, 0.22, 0.22, 0.22, 0.22, 0.24, 0.24, 0.24, 0.24, 0.24, 0.13, 0.13, 0.13, 0.13, 0.13, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.33, 0.33, 0.33, 0.33, 0.33, 0.24, 0.24, 0.24, 0.24, 0.24, 0.2, 0.2, 0.2, 0.2, 0.2, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.14, 0.14, 0.14, 0.14, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.21, 0.21]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 527 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1717659739 --> 1717660371
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0]
                    
Loading

mofosyne mofosyne added Review Complexity : Medium
fix: fix the usage of the code model
9a65c7a2
JoanFM JoanFM force pushed from 17a5e9f4 to 9a65c7a2 1 year ago
Merge branch 'master' of https://github.com/JoanFM/llama.cpp into fea…
96a6f552
JoanFM JoanFM force pushed from 4117c40b to 96a6f552 1 year ago
teleprint-me
teleprint-me1 year ago (edited 1 year ago)

It's how the tokens are handled in llama.cpp. I'm in the middle of figuring out how tokenizers operates under the hood and seeing if there's a way to create a bridge between the two. Actually, your input would be invaluable (#7379). Or if you know someone that's better suited and has a deeper understanding of tokenizers (e.g. BPE/WPM) in general. I'm interested in Jina because the english version uses WPM. The spanish and dutch versions use BPE. I'm more focused on Llama-2 and Llama-3 for BPE.

Aside: I have no idea how the CD/CI is setup here. I have some experience with Jenkins, but all of this is outside of the scope of what I'm focused on. Also, I'm just a contributor. I just chime in when I think I might have something of value to add.

JoanFM
JoanFM1 year ago

It's how the tokens are handled in llama.cpp. I'm in the middle of figuring out how tokenizers operates under the hood and seeing if there's a way to create a bridge between the two. Actually, your input would be invaluable (#7379). Or if you know someone that's better suited and has a deeper understanding of tokenizers (e.g. BPE/WPM) in general. I'm interested in Jina because the english version uses WPM. The spanish and dutch versions use BPE. I'm more focused on Llama-2 and Llama-3 for BPE.

Aside: I have no idea how the CD/CI is setup here. I have some experience with Jenkins, but all of this is outside of the scope of what I'm focused on. Also, I'm just a contributor. I just chime in when I think I might have something of value to add.

Hey @teleprint-me ,

To be honest, I found it quite hard to work with tokenizer logic here, but I do not quite understand what you aim to achieve in #7379. If you want we can jump in a call to discuss and make this process more agile.

ggerganov
ggerganov1 year ago

could you also guide me on how to fix the CI problems?

Rebase on latest master and the CI should work

JoanFM
JoanFM1 year ago

could you also guide me on how to fix the CI problems?

Rebase on latest master and the CI should work

I will, thanks

ggerganov
ggerganov1 year ago

So ):\ tit should not be matched. Is there any logic in the code that eliminates these patterns \ from the vocab?

Hm, not sure why this happens. We don't escape strings in the vocab - only in the prompt input:

https://github.com/ggerganov/llama.cpp/blob/3b38d48609280aa5f8ab7ea135a4351b2a5ee240/common/common.cpp#L249-L257

JoanFM
JoanFM1 year ago

So ):\ tit should not be matched. Is there any logic in the code that eliminates these patterns \ from the vocab?

Hm, not sure why this happens. We don't escape strings in the vocab - only in the prompt input:

https://github.com/ggerganov/llama.cpp/blob/3b38d48609280aa5f8ab7ea135a4351b2a5ee240/common/common.cpp#L249-L257

I will try to investigate this

JoanFM
JoanFM1 year ago

@ggerganov,

I am also trying to see if I can add support for chinese model and I manage to get it to work for English, but not for the Chinese characters. Is there a supported model in Chinese? So I can see if I can inspire on which tokenizers they use, etc ...?

ggerganov
ggerganov1 year ago

I believe the most recent model that we added and also supports Chinese is https://huggingface.co/deepseek-ai/DeepSeek-V2. See if @fairydreaming's PR could be of any help: #7519

JoanFM
JoanFM1 year ago (edited 1 year ago)👀 1

Hey @ggerganov ,

I am starting to think that it is not a problem of the tokenizer.

Here is my observation.

I am tryng to run this code to check how the embedding behaves:

gdb --args ../build/bin/embedding -m ./jina-embeddings-v2-base-code.gguf --threads 1 --verbose-prompt -p "for idx, x in enumerate(xs):\n    print(idx, x)"

and this is what gdb is telling me:

(gdb) run
Starting program: /home/joan/workspace/ollama/llm/llama.cpp/build/bin/embedding -m ./jina-embeddings-v2-base-code.gguf --threads 1 --verbose-prompt -p for\ idx,\ x\ in\ enumerate\(xs\):\\n\ \ \ \ print\(idx,\ x\)

Look at all the \ that have been added. This seems to be the reason why I get different tokenization, in Python If I add an extra \ before \\n I get the same encoding.

I am not sure if it is a problem of how the standard input is encoded or something? Do you happen to have any clue about this?

if I hardcode this sentence and avoid the split lines.

params.prompt = "for idx, x in enumerate(xs):\n    print(idx, x)";

I get the same behavior as in Python

ggerganov
ggerganov1 year ago👀 1

I see, does adding -e to the command-line argument fix the issue?

 ../build/bin/embedding -m ./jina-embeddings-v2-base-code.gguf --threads 1 --verbose-prompt -e -p "for idx, x in enumerate(xs):\n    print(idx, x)"
JoanFM
JoanFM1 year ago🎉 1

I see, does adding -e to the command-line argument fix the issue?

 ../build/bin/embedding -m ./jina-embeddings-v2-base-code.gguf --threads 1 --verbose-prompt -e -p "for idx, x in enumerate(xs):\n    print(idx, x)"

Oh, it does!

Merge branch 'master' of https://github.com/JoanFM/llama.cpp into fea…
0fc775ed
JoanFM JoanFM marked this pull request as ready for review 1 year ago
JoanFM
JoanFM commented on 2024-06-04
Conversation is marked as resolved
Show resolved
convert-hf-to-gguf.py
2451 if 'gated_layer' in name:
24552452 d1 = data[:self.intermediate_size, :]
24562453 name1 = name.replace('gated_layers', 'gated_layers_w')
2454
name1 = name.replace('up_gated_layer', 'gated_layers_v')
JoanFM1 year ago

have to fix this

Conversation is marked as resolved
Show resolved
convert-hf-to-gguf.py
459456 if chkhsh == "b6dc8df998e1cfbdc4eac8243701a65afe638679230920b50d6f17d81c098166":
460457 # ref: https://huggingface.co/allenai/OLMo-1.7-7B-hf
461458 res = "olmo"
462
if chkhsh == "a8594e3edff7c29c003940395316294b2c623e09894deebbc65f33f1515df79e":
JoanFM1 year ago

have to fix this

Conversation is marked as resolved
Show resolved
convert-hf-to-gguf.py
420420 # NOTE: if you get an error here, you need to update the convert-hf-to-gguf-update.py script
421421 # or pull the latest version of the model from Huggingface
422422 # don't edit the hashes manually!
423
if chkhsh == "0ef9807a4087ebef797fc749390439009c3b9eda9ad1a097abbe738f486c01e5":
JoanFM1 year ago

have to fix this

Conversation is marked as resolved
Show resolved
convert-hf-to-gguf.py
2454 name1 = name.replace('up_gated_layer', 'gated_layers_v')
24572455 d2 = data[self.intermediate_size:, :]
24582456 name2 = name.replace('gated_layers', 'gated_layers_v')
2457
name2 = name.replace('up_gated_layer', 'gated_layers_w')
JoanFM1 year ago

have to fix this

fix: fix comments
4bce30cc
JoanFM
JoanFM1 year ago

I see, does adding -e to the command-line argument fix the issue?

 ../build/bin/embedding -m ./jina-embeddings-v2-base-code.gguf --threads 1 --verbose-prompt -e -p "for idx, x in enumerate(xs):\n    print(idx, x)"

@ggerganov ,

how then can we be sure this behavior is available in the server? I see this escape option only available in the example itself.

ggerganov
ggerganov1 year ago👍 1

I believe server already escapes these through the JSON parsing library. Btw all examples now escape by default since #7675, so no need to even add -e explicitly

fix: fix linting issues
3b44f8f6
JoanFM JoanFM force pushed from 0481e5fe to 3b44f8f6 1 year ago
JoanFM
JoanFM commented on 2024-06-05
Conversation is marked as resolved
Show resolved
llama.cpp
JoanFM1 year ago

I will put it back

Conversation is marked as resolved
Show resolved
llama.cpp
JoanFM1 year ago

fix this

Conversation is marked as resolved
Show resolved
llama.cpp
JoanFM1 year ago

change back

fix: remove ollama patches
05659d3c
JoanFM JoanFM force pushed from 404daca1 to 05659d3c 1 year ago
JoanFM
JoanFM1 year ago

Hey @ggerganov,

Is there something from my code that may have caused this CI to fail?

ggerganov
ggerganov1 year ago

Probably just a fluke, will restart the workflows now

Merge branch 'master' of https://github.com/JoanFM/llama.cpp into fea…
7ab6023b
JoanFM
JoanFM1 year ago

@ggerganov I tested the behavior in server and works, I consider this is ready to be reviewed.

ggerganov style : minor
4c4d877d
ggerganov
ggerganov approved these changes on 2024-06-06
ggerganov ggerganov merged f5d7b268 into master 1 year ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone