llama.cpp
llama : support Jamba hybrid Transformer-Mamba models
#7531
Merged

llama : support Jamba hybrid Transformer-Mamba models #7531

compilade merged 61 commits into master from compilade/refactor-kv-cache
compilade
compilade wip: llama : separate recurrent states from the KV cache
271104c6
compilade llama : use std::find for seq_nodes in llama_rs_cache
8db1e4d4
compilade llama : state checkpoints for recurrent models
0028010d
compilade llama : correctly handle more edge cases for the rs cache
0c8b3b20
compilade Merge branch 'master' into compilade/refactor-kv-cache
d66849f6
compilade llama : rename many llama_kv_cache_* functions
a09db95e
compilade Merge branch 'master' into compilade/refactor-kv-cache
c460ff1a
compilade llama : remove useless return value for some llama_cache_* functions
b6fafd17
compilade Merge branch 'master' into compilade/refactor-kv-cache
b7ec12eb
compilade Merge branch 'master' into compilade/refactor-kv-cache
3b57b55c
compilade llama : rethink recurrent state cell counts
7e13f19f
compilade llama : support Jamba
cbc743e6
compilade Merge branch 'master' into compilade/refactor-kv-cache
0fd13e94
compilade llama : fix BERT inference without KV cache
61a88a1d
compilade compilade added enhancement
compilade compilade added model
compilade compilade added refactoring
compilade compilade added need feedback
compilade compilade added embeddings
compilade compilade added python
compilade compilade added Review Complexity : High
compilade compilade marked this pull request as draft 1 year ago
compilade
compilade commented on 2024-05-25
github-actions github-actions added ggml
github-actions
arch-btw
compilade convert-hf : check for unprocessed Jamba experts
ea2e63e9
compilade convert-hf : support Mini-Jamba conversion
fc59407e
TechxGenus
ggerganov
ggerganov commented on 2024-05-26
severian42
compilade llama : fix Jamba quantization sanity checks
181dadf2
compilade llama : sequence-length-aware batch splitting
3a414b0b
compilade Merge branch 'master' into compilade/refactor-kv-cache
4e4c41e5
severian42
compilade
compilade llama : use equal-sequence-length sub-batches for recurrent models
3587a949
compilade Merge branch 'master' into compilade/refactor-kv-cache
5d3c7b95
compilade llama : fix batch split output count for embeddings
72eea492
compilade
compilade llama : minimize swaps when reordering logits
18d1c140
compilade llama : fix edge case finding batch seq_id of split recurrent cell
61200ef2
compilade llama : avoid copies for simple batch splits
eb589d5e
compilade llama : use im2col and mul_mat to perform convolution for Mamba
8fb57ac0
compilade llama : fix .base() compilation error on Windows
17f6c1ef
compilade llama : allow doing the equivalent of SSM_CONV with SUM_ROWS and MUL
fee3c1d7
compilade Merge branch 'master' into compilade/refactor-kv-cache
6840ac0b
compilade llama : rename llama_cache to llama_past
372482df
ggerganov
compilade examples : replace llama_kv_cache_seq_* with llama_past_seq_*
43d8d4bf
ggerganov
compilade Merge branch 'master' into compilade/refactor-kv-cache
ff794f55
compilade mamba : fix non-contiguous usage of ggml_silu
33425a7e
github-actions github-actions added android
github-actions github-actions added examples
github-actions github-actions added server
compilade
ggerganov
compilade
ggerganov
compilade Merge branch 'master' into compilade/refactor-kv-cache
10c3c419
compilade Merge branch 'master' into compilade/refactor-kv-cache
9b38f8bf
compilade
Autumnlight02
compilade
compilade Merge branch 'master' into compilade/refactor-kv-cache
bc320ef6
compilade
ggerganov
compilade llama : session saving and reloading for hybrid models
fcb889cf
compilade Merge branch 'master' into compilade/refactor-kv-cache
a03e32a3
compilade convert_hf : fix Jamba conversion
9d3f44da
compilade llama : fix mixed signedness comparison
5f62db79
compilade llama : use unused n_embd_k_gqa in k_shift
375de5b1
compilade
compilade llama : begin renaming llama_past back to llama_kv_cache
4bb4b22a
compilade Merge branch 'master' into compilade/refactor-kv-cache
63ac36b2
hg0428
compilade Merge branch 'master' into compilade/refactor-kv-cache
124c222f
compilade llama : remove implicit recurrent state rollbacks
8006f3b3
compilade Merge branch 'master' into compilade/refactor-kv-cache
691698e1
compilade llama : partially apply clang-format style
e3fe6120
theogbob
lexasub
hg0428
gabe-l-hart
compilade
gabe-l-hart
gabe-l-hart
compilade
gabe-l-hart
gabe-l-hart
compilade Merge branch 'master' into compilade/refactor-kv-cache
2bcaf64e
compilade compilade force pushed to 2bcaf64e 165 days ago
compilade convert : fix jamba conv1d shape squeezing
908e6559
compilade compilade marked this pull request as ready for review 165 days ago
compilade
compilade commented on 2025-07-03
gabe-l-hart
gabe-l-hart commented on 2025-07-03
compilade Merge branch 'master' into compilade/refactor-kv-cache
4682e21c
compilade graph : add back hybrid memory graph input
20f8e43e
compilade model : add Jamba to Mamba-specific hparams printing
07c252f0
compilade
compilade commented on 2025-07-03
ggerganov
ggerganov approved these changes on 2025-07-04
compilade Merge branch 'master' into compilade/refactor-kv-cache
f7163582
gabe-l-hart
compilade
gabe-l-hart
gabe-l-hart
gabe-l-hart
compilade compilade added merge ready
compilade Merge branch 'master' into compilade/refactor-kv-cache
b0b280ea
compilade jamba : remove redundant nullptr initializations
db5ff0cc
CISC
CISC requested changes on 2025-07-08
compilade model : remove unnecessary prefix for tensor loading constants
2f39cd7b
compilade model : use ggml_swiglu_split for Mamba
f7c7a926
CISC
CISC approved these changes on 2025-07-08
compilade Merge branch 'master' into compilade/refactor-kv-cache
a60a24be
compilade model : make falcon-h1 use shared mamba2 layer builder
7f3955a0
compilade memory : avoid referring to KV in recurrent cache logs
452207f3
gabe-l-hart
gabe-l-hart
compilade
gabe-l-hart
compilade gguf-py : avoid adding duplicate tensor mappings for Jamba
4d6a179c
compilade
compilade compilade merged 4a5686da into master 158 days ago
ggerganov ggerganov added hot
csabakecskemeti
gabe-l-hart

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone