PR #14560 model : add PLaMo-2 model

wip: llama : separate recurrent states from the KV cache

271104c6

llama : use std::find for seq_nodes in llama_rs_cache

8db1e4d4

llama : state checkpoints for recurrent models

0028010d

llama : correctly handle more edge cases for the rs cache

0c8b3b20

Merge branch 'master' into compilade/refactor-kv-cache

d66849f6

llama : rename many llama_kv_cache_* functions

a09db95e

Merge branch 'master' into compilade/refactor-kv-cache

c460ff1a

llama : remove useless return value for some llama_cache_* functions

b6fafd17

Merge branch 'master' into compilade/refactor-kv-cache

b7ec12eb

Merge branch 'master' into compilade/refactor-kv-cache

3b57b55c

llama : rethink recurrent state cell counts

7e13f19f

llama : support Jamba

cbc743e6

Merge branch 'master' into compilade/refactor-kv-cache

0fd13e94

llama : fix BERT inference without KV cache

61a88a1d

convert-hf : check for unprocessed Jamba experts

ea2e63e9

convert-hf : support Mini-Jamba conversion

fc59407e

llama : fix Jamba quantization sanity checks

181dadf2

llama : sequence-length-aware batch splitting

3a414b0b

Merge branch 'master' into compilade/refactor-kv-cache

4e4c41e5

llama : use equal-sequence-length sub-batches for recurrent models

3587a949

Merge branch 'master' into compilade/refactor-kv-cache

5d3c7b95

llama : fix batch split output count for embeddings

72eea492

llama : minimize swaps when reordering logits

18d1c140

llama : fix edge case finding batch seq_id of split recurrent cell

61200ef2

llama : avoid copies for simple batch splits

eb589d5e

llama : use im2col and mul_mat to perform convolution for Mamba

8fb57ac0

llama : fix .base() compilation error on Windows

17f6c1ef

llama : allow doing the equivalent of SSM_CONV with SUM_ROWS and MUL

fee3c1d7

Merge branch 'master' into compilade/refactor-kv-cache

6840ac0b

llama : rename llama_cache to llama_past

372482df

examples : replace llama_kv_cache_seq_* with llama_past_seq_*

43d8d4bf

Merge branch 'master' into compilade/refactor-kv-cache

ff794f55

mamba : fix non-contiguous usage of ggml_silu

33425a7e

Merge branch 'master' into compilade/refactor-kv-cache

10c3c419

Merge branch 'master' into compilade/refactor-kv-cache

9b38f8bf

Merge branch 'master' into compilade/refactor-kv-cache

bc320ef6

llama : session saving and reloading for hybrid models

fcb889cf

Merge branch 'master' into compilade/refactor-kv-cache

a03e32a3

convert_hf : fix Jamba conversion

9d3f44da

llama : fix mixed signedness comparison

5f62db79

llama : use unused n_embd_k_gqa in k_shift

375de5b1

llama : begin renaming llama_past back to llama_kv_cache

4bb4b22a

Merge branch 'master' into compilade/refactor-kv-cache

63ac36b2

Merge branch 'master' into compilade/refactor-kv-cache

124c222f

llama : remove implicit recurrent state rollbacks

8006f3b3

Merge branch 'master' into compilade/refactor-kv-cache

691698e1

llama : partially apply clang-format style

e3fe6120

Merge branch 'master' into compilade/refactor-kv-cache

2bcaf64e

convert : fix jamba conv1d shape squeezing

908e6559

Merge branch 'master' into compilade/refactor-kv-cache

4682e21c

graph : add back hybrid memory graph input

20f8e43e

model : add Jamba to Mamba-specific hparams printing

07c252f0

github-actions added examples

github-actions added python

mitmul changed the title ~~Mitmul/add plamo2~~ Add PLaMo-2 model 160 days ago

mitmul marked this pull request as ready for review 160 days ago

Merge branch 'master' into compilade/refactor-kv-cache

f7163582

Add PLaMo-2 model using hybrid memory module

f6567128

Fix z shape

4728e42a

mitmul force pushed to 4728e42a 159 days ago

Add cmath to include from llama-vocab.h

6acaf3c5

Explicitly dequantize normalization weights before RoPE apply

7e4c5ecc

Revert unnecessary cast because the problem can be solved by excludin…

149b98c8

Use ATTN_K/Q_NORM for k,q weights to prevent quantization

77865202

mitmul changed the title ~~Add PLaMo-2 model~~ model : add PLaMo-2 model 159 days ago

compilade commented on 2025-07-07

Remove SSM_BCDT that is not used from anywhere

0424a76e

Do not duplicate embedding weights for output.weight

ea95a1da

ggerganov commented on 2025-07-09

Fix tokenizer encoding problem for multibyte strings

2d76b21e

Merge remote-tracking branch 'upstream/master' into mitmul/add-plamo2

fccec6db

mitmul force pushed to fccec6db 157 days ago

Merge branch 'master' into mitmul/add-plamo2

5231e4f7

CISC commented on 2025-07-11

Apply suggestion from @CISC

521c1e0f

Update src/llama-model.cpp

df95fced

Use LLM_FFN_SWIGLU instead of splitting ffn_gate and ffn_up

498b8b37

Remove unnecessary part for Grouped Query Attention

6afd3be0

Fix how to load special token id to gguf

34360ebe

Remove unused tensor mapping

71abd3ad

CISC commented on 2025-07-12

Update src/llama-model.cpp

fb2ae69a

Remove llama_vocab_plamo2 class and replace it with llm_tokenizer_pla…

eea696e4

mitmul force pushed to eea696e4 153 days ago

ggerganov approved these changes on 2025-07-14

ggerganov requested a review from

CISC 153 days ago

Update src/llama-vocab.cpp

841ffc85

CISC requested changes on 2025-07-14

CISC commented on 2025-07-14

Update convert_hf_to_gguf.py

35d81889

Update src/llama-model.cpp

d134e7f6

Update src/llama-model.cpp

921e864d

Merge remote-tracking branch 'upstream/master' into mitmul/add-plamo2

f87ac1c9

CISC requested changes on 2025-07-15

Update convert_hf_to_gguf.py

7b0b2ead

Update convert_hf_to_gguf.py

b42f95d6

CISC approved these changes on 2025-07-15

Fix plamo2 tokenizer session to prevent multiple calls of build()

6921534f

CISC merged 68e37a61 into master 152 days ago

mitmul deleted the mitmul/add-plamo2 branch 152 days ago

llama.cpp
model : add PLaMo-2 model
#14560

Merged

model : add PLaMo-2 model #14560

llama.cpp model : add PLaMo-2 model #14560 Merged

model : add PLaMo-2 model #14560

llama.cpp
model : add PLaMo-2 model
#14560

Merged