llama.cpp
llama : support Jamba hybrid Transformer-Mamba models
#7531
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
61
Changes
View On
GitHub
llama : support Jamba hybrid Transformer-Mamba models
#7531
compilade
merged 61 commits into
master
from
compilade/refactor-kv-cache
wip: llama : separate recurrent states from the KV cache
271104c6
llama : use std::find for seq_nodes in llama_rs_cache
8db1e4d4
llama : state checkpoints for recurrent models
0028010d
llama : correctly handle more edge cases for the rs cache
0c8b3b20
Merge branch 'master' into compilade/refactor-kv-cache
d66849f6
llama : rename many llama_kv_cache_* functions
a09db95e
Merge branch 'master' into compilade/refactor-kv-cache
c460ff1a
llama : remove useless return value for some llama_cache_* functions
b6fafd17
Merge branch 'master' into compilade/refactor-kv-cache
b7ec12eb
Merge branch 'master' into compilade/refactor-kv-cache
3b57b55c
llama : rethink recurrent state cell counts
7e13f19f
llama : support Jamba
cbc743e6
Merge branch 'master' into compilade/refactor-kv-cache
0fd13e94
llama : fix BERT inference without KV cache
61a88a1d
compilade
added
enhancement
compilade
added
model
compilade
added
refactoring
compilade
added
need feedback
compilade
added
embeddings
compilade
added
python
compilade
added
Review Complexity : High
compilade
marked this pull request as draft
1 year ago
compilade
commented on 2024-05-25
github-actions
added
ggml
convert-hf : check for unprocessed Jamba experts
ea2e63e9
convert-hf : support Mini-Jamba conversion
fc59407e
ggerganov
commented on 2024-05-26
llama : fix Jamba quantization sanity checks
181dadf2
llama : sequence-length-aware batch splitting
3a414b0b
Merge branch 'master' into compilade/refactor-kv-cache
4e4c41e5
llama : use equal-sequence-length sub-batches for recurrent models
3587a949
Merge branch 'master' into compilade/refactor-kv-cache
5d3c7b95
llama : fix batch split output count for embeddings
72eea492
llama : minimize swaps when reordering logits
18d1c140
llama : fix edge case finding batch seq_id of split recurrent cell
61200ef2
llama : avoid copies for simple batch splits
eb589d5e
llama : use im2col and mul_mat to perform convolution for Mamba
8fb57ac0
llama : fix .base() compilation error on Windows
17f6c1ef
llama : allow doing the equivalent of SSM_CONV with SUM_ROWS and MUL
fee3c1d7
Merge branch 'master' into compilade/refactor-kv-cache
6840ac0b
llama : rename llama_cache to llama_past
372482df
examples : replace llama_kv_cache_seq_* with llama_past_seq_*
43d8d4bf
Merge branch 'master' into compilade/refactor-kv-cache
ff794f55
mamba : fix non-contiguous usage of ggml_silu
33425a7e
github-actions
added
android
github-actions
added
examples
github-actions
added
server
Merge branch 'master' into compilade/refactor-kv-cache
10c3c419
Merge branch 'master' into compilade/refactor-kv-cache
9b38f8bf
Merge branch 'master' into compilade/refactor-kv-cache
bc320ef6
llama : session saving and reloading for hybrid models
fcb889cf
Merge branch 'master' into compilade/refactor-kv-cache
a03e32a3
convert_hf : fix Jamba conversion
9d3f44da
llama : fix mixed signedness comparison
5f62db79
llama : use unused n_embd_k_gqa in k_shift
375de5b1
llama : begin renaming llama_past back to llama_kv_cache
4bb4b22a
Merge branch 'master' into compilade/refactor-kv-cache
63ac36b2
Merge branch 'master' into compilade/refactor-kv-cache
124c222f
llama : remove implicit recurrent state rollbacks
8006f3b3
Merge branch 'master' into compilade/refactor-kv-cache
691698e1
llama : partially apply clang-format style
e3fe6120
Merge branch 'master' into compilade/refactor-kv-cache
2bcaf64e
compilade
force pushed
to
2bcaf64e
165 days ago
convert : fix jamba conv1d shape squeezing
908e6559
compilade
marked this pull request as ready for review
165 days ago
compilade
commented on 2025-07-03
gabe-l-hart
commented on 2025-07-03
Merge branch 'master' into compilade/refactor-kv-cache
4682e21c
graph : add back hybrid memory graph input
20f8e43e
model : add Jamba to Mamba-specific hparams printing
07c252f0
compilade
commented on 2025-07-03
ggerganov
approved these changes on 2025-07-04
Merge branch 'master' into compilade/refactor-kv-cache
f7163582
compilade
added
merge ready
Merge branch 'master' into compilade/refactor-kv-cache
b0b280ea
jamba : remove redundant nullptr initializations
db5ff0cc
CISC
requested changes on 2025-07-08
model : remove unnecessary prefix for tensor loading constants
2f39cd7b
model : use ggml_swiglu_split for Mamba
f7c7a926
CISC
approved these changes on 2025-07-08
Merge branch 'master' into compilade/refactor-kv-cache
a60a24be
model : make falcon-h1 use shared mamba2 layer builder
7f3955a0
memory : avoid referring to KV in recurrent cache logs
452207f3
gguf-py : avoid adding duplicate tensor mappings for Jamba
4d6a179c
compilade
merged
4a5686da
into master
158 days ago
ggerganov
added
hot
Login to write a write a comment.
Login via GitHub
Reviewers
CISC
ggerganov
ibrahimkhadraoui
gabe-l-hart
Assignees
No one assigned
Labels
enhancement
model
android
refactoring
need feedback
examples
embeddings
python
Review Complexity : High
server
ggml
merge ready
hot
Milestone
No milestone
Login to write a write a comment.
Login via GitHub