model : add PLaMo-2 model #14560
wip: llama : separate recurrent states from the KV cache
271104c6
llama : use std::find for seq_nodes in llama_rs_cache
8db1e4d4
llama : state checkpoints for recurrent models
0028010d
llama : correctly handle more edge cases for the rs cache
0c8b3b20
Merge branch 'master' into compilade/refactor-kv-cache
d66849f6
llama : rename many llama_kv_cache_* functions
a09db95e
Merge branch 'master' into compilade/refactor-kv-cache
c460ff1a
llama : remove useless return value for some llama_cache_* functions
b6fafd17
Merge branch 'master' into compilade/refactor-kv-cache
b7ec12eb
Merge branch 'master' into compilade/refactor-kv-cache
3b57b55c
llama : rethink recurrent state cell counts
7e13f19f
llama : support Jamba
cbc743e6
Merge branch 'master' into compilade/refactor-kv-cache
0fd13e94
llama : fix BERT inference without KV cache
61a88a1d
convert-hf : check for unprocessed Jamba experts
ea2e63e9
convert-hf : support Mini-Jamba conversion
fc59407e
llama : fix Jamba quantization sanity checks
181dadf2
llama : sequence-length-aware batch splitting
3a414b0b
Merge branch 'master' into compilade/refactor-kv-cache
4e4c41e5
llama : use equal-sequence-length sub-batches for recurrent models
3587a949
Merge branch 'master' into compilade/refactor-kv-cache
5d3c7b95
llama : fix batch split output count for embeddings
72eea492
llama : minimize swaps when reordering logits
18d1c140
llama : fix edge case finding batch seq_id of split recurrent cell
61200ef2
llama : avoid copies for simple batch splits
eb589d5e
llama : use im2col and mul_mat to perform convolution for Mamba
8fb57ac0
llama : fix .base() compilation error on Windows
17f6c1ef
llama : allow doing the equivalent of SSM_CONV with SUM_ROWS and MUL
fee3c1d7
Merge branch 'master' into compilade/refactor-kv-cache
6840ac0b
llama : rename llama_cache to llama_past
372482df
examples : replace llama_kv_cache_seq_* with llama_past_seq_*
43d8d4bf
Merge branch 'master' into compilade/refactor-kv-cache
ff794f55
mamba : fix non-contiguous usage of ggml_silu
33425a7e
Merge branch 'master' into compilade/refactor-kv-cache
10c3c419
Merge branch 'master' into compilade/refactor-kv-cache
9b38f8bf
Merge branch 'master' into compilade/refactor-kv-cache
bc320ef6
llama : session saving and reloading for hybrid models
fcb889cf
Merge branch 'master' into compilade/refactor-kv-cache
a03e32a3
convert_hf : fix Jamba conversion
9d3f44da
llama : fix mixed signedness comparison
5f62db79
llama : use unused n_embd_k_gqa in k_shift
375de5b1
llama : begin renaming llama_past back to llama_kv_cache
4bb4b22a
Merge branch 'master' into compilade/refactor-kv-cache
63ac36b2
Merge branch 'master' into compilade/refactor-kv-cache
124c222f
llama : remove implicit recurrent state rollbacks
8006f3b3
Merge branch 'master' into compilade/refactor-kv-cache
691698e1
llama : partially apply clang-format style
e3fe6120
Merge branch 'master' into compilade/refactor-kv-cache
2bcaf64e
convert : fix jamba conv1d shape squeezing
908e6559
Merge branch 'master' into compilade/refactor-kv-cache
4682e21c
graph : add back hybrid memory graph input
20f8e43e
model : add Jamba to Mamba-specific hparams printing
07c252f0
mitmul
changed the title Mitmul/add plamo2 Add PLaMo-2 model 160 days ago
mitmul
marked this pull request as ready for review 160 days ago
Merge branch 'master' into compilade/refactor-kv-cache
f7163582
Add PLaMo-2 model using hybrid memory module
f6567128
Fix z shape
4728e42a
mitmul
force pushed
to
4728e42a
159 days ago
Add cmath to include from llama-vocab.h
6acaf3c5
Explicitly dequantize normalization weights before RoPE apply
7e4c5ecc
Revert unnecessary cast because the problem can be solved by excludin…
149b98c8
Use ATTN_K/Q_NORM for k,q weights to prevent quantization
77865202
mitmul
changed the title Add PLaMo-2 model model : add PLaMo-2 model 159 days ago
Remove SSM_BCDT that is not used from anywhere
0424a76e
Do not duplicate embedding weights for output.weight
ea95a1da
Fix tokenizer encoding problem for multibyte strings
2d76b21e
Merge remote-tracking branch 'upstream/master' into mitmul/add-plamo2
fccec6db
mitmul
force pushed
to
fccec6db
157 days ago
Merge branch 'master' into mitmul/add-plamo2
5231e4f7
CISC
commented
on 2025-07-11
Apply suggestion from @CISC
521c1e0f
Update src/llama-model.cpp
df95fced
Use LLM_FFN_SWIGLU instead of splitting ffn_gate and ffn_up
498b8b37
Remove unnecessary part for Grouped Query Attention
6afd3be0
Fix how to load special token id to gguf
34360ebe
Remove unused tensor mapping
71abd3ad
CISC
commented
on 2025-07-12
Update src/llama-model.cpp
fb2ae69a
Remove llama_vocab_plamo2 class and replace it with llm_tokenizer_pla…
eea696e4
mitmul
force pushed
to
eea696e4
153 days ago
ggerganov
approved these changes
on 2025-07-14
Update src/llama-vocab.cpp
841ffc85
CISC
requested changes
on 2025-07-14
CISC
commented
on 2025-07-14
Update convert_hf_to_gguf.py
35d81889
Update src/llama-model.cpp
d134e7f6
Update src/llama-model.cpp
921e864d
Merge remote-tracking branch 'upstream/master' into mitmul/add-plamo2
f87ac1c9
CISC
requested changes
on 2025-07-15
Update convert_hf_to_gguf.py
7b0b2ead
Update convert_hf_to_gguf.py
b42f95d6
CISC
approved these changes
on 2025-07-15
Fix plamo2 tokenizer session to prevent multiple calls of build()
6921534f
CISC
merged
68e37a61
into master 152 days ago
mitmul
deleted the mitmul/add-plamo2 branch 152 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub