llama.cpp
Granite Four
#13550
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
173
Changes
View On
GitHub
Granite Four
#13550
CISC
merged 173 commits into
ggml-org:master
from
gabe-l-hart:GraniteFour
wip: llama : separate recurrent states from the KV cache
271104c6
llama : use std::find for seq_nodes in llama_rs_cache
8db1e4d4
llama : state checkpoints for recurrent models
0028010d
llama : correctly handle more edge cases for the rs cache
0c8b3b20
Merge branch 'master' into compilade/refactor-kv-cache
d66849f6
llama : rename many llama_kv_cache_* functions
a09db95e
Merge branch 'master' into compilade/refactor-kv-cache
c460ff1a
llama : remove useless return value for some llama_cache_* functions
b6fafd17
Merge branch 'master' into compilade/refactor-kv-cache
b7ec12eb
Merge branch 'master' into compilade/refactor-kv-cache
3b57b55c
llama : rethink recurrent state cell counts
7e13f19f
llama : support Jamba
cbc743e6
Merge branch 'master' into compilade/refactor-kv-cache
0fd13e94
llama : fix BERT inference without KV cache
61a88a1d
convert-hf : check for unprocessed Jamba experts
ea2e63e9
convert-hf : support Mini-Jamba conversion
fc59407e
llama : fix Jamba quantization sanity checks
181dadf2
llama : sequence-length-aware batch splitting
3a414b0b
Merge branch 'master' into compilade/refactor-kv-cache
4e4c41e5
llama : use equal-sequence-length sub-batches for recurrent models
3587a949
Merge branch 'master' into compilade/refactor-kv-cache
5d3c7b95
llama : fix batch split output count for embeddings
72eea492
llama : minimize swaps when reordering logits
18d1c140
llama : fix edge case finding batch seq_id of split recurrent cell
61200ef2
llama : avoid copies for simple batch splits
eb589d5e
llama : use im2col and mul_mat to perform convolution for Mamba
8fb57ac0
llama : fix .base() compilation error on Windows
17f6c1ef
llama : allow doing the equivalent of SSM_CONV with SUM_ROWS and MUL
fee3c1d7
Merge branch 'master' into compilade/refactor-kv-cache
6840ac0b
llama : rename llama_cache to llama_past
372482df
examples : replace llama_kv_cache_seq_* with llama_past_seq_*
43d8d4bf
Merge branch 'master' into compilade/refactor-kv-cache
ff794f55
mamba : fix non-contiguous usage of ggml_silu
33425a7e
Merge branch 'master' into compilade/refactor-kv-cache
10c3c419
Merge branch 'master' into compilade/refactor-kv-cache
9b38f8bf
llama : initial Mamba-2 support
1f0fea70
ggml : SIMD ggml_ssm_scan for Mamba-2
dceff23f
llama : support running Mamba-Codestral-7B-v0.1
2bfe9de6
llama : fix Mamba-2 conv state saving
aff96920
llama : remove unused variable
e04910dc
llama : add missing break
fa358e70
convert_hf : prefer SentencePiece tokenizer for Mamba-2 when present
38913dc8
Merge branch 'master' into compilade/refactor-kv-cache
bc320ef6
llama : session saving and reloading for hybrid models
fcb889cf
Merge branch 'master' into compilade/refactor-kv-cache
a03e32a3
convert_hf : fix Jamba conversion
9d3f44da
llama : fix mixed signedness comparison
5f62db79
llama : use unused n_embd_k_gqa in k_shift
375de5b1
llama : begin renaming llama_past back to llama_kv_cache
4bb4b22a
Merge branch 'master' into compilade/refactor-kv-cache
63ac36b2
Merge branch 'master' into compilade/mamba2
0e601caf
llama : avoid redundant state copy for Mamba 1 and 2
273e7a49
Merge branch 'master' into compilade/mamba2
7d6cb368
metal : attempt to adapt SSM_SCAN for Mamba-2
2c77d799
metal : fix SSM_SCAN pipeline scope
87b97d08
metal : use log and exp instead of log1pf and expf in SSM_SCAN
03d0e6ea
metal : remove unused arguments for SSM_SCAN
7a351abc
metal : add back n_seqs to SSM_SCAN args
8b15bc6f
metal : fix SSM_SCAN state head offset
5b8ec2b9
metal : fix wrong number of tokens per sequence in SSM_SCAN
62b09b34
Merge branch 'master' into compilade/refactor-kv-cache
124c222f
Merge branch 'master' into compilade/mamba2
038d9583
ggml : remove unused fast broadcast path in GGML_MUL
805512a7
Merge branch 'master' into compilade/mamba2
7d16e1bc
ggml : avoid multiply by D in GGML_OP_SSM_SCAN
3bc7103d
Merge branch 'master' into compilade/mamba2
8d8f0657
convert : fix flake8 lint
b4e9c599
llama : remove implicit recurrent state rollbacks
8006f3b3
Merge branch 'master' into compilade/refactor-kv-cache
691698e1
llama : partially apply clang-format style
e3fe6120
Merge branch 'master' into compilade/mamba2
1ee6c482
Merge branch 'master' into compilade/mamba2
c9ecf620
Merge branch 'master' into compilade/mamba2
35d06fac
metal : fix confusion between ; and ,
cf4f0a41
metal : add missing args for nb references in ssm_scan_f32_group
6def5cd7
metal : single-user mamba2 inference works
791998b4
kv-cache : remove const_cast when setting inputs for s_copy
94c3d530
Merge branch 'master' into compilade/mamba2
929fe85d
convert : avoid AutoConfig for Mamba and Mamba2 hparams
d55b0d06
kv-cache : allow context shift for recurrent models
e94f3932
github-actions
added
testing
github-actions
added
python
github-actions
added
ggml
github-actions
added
Apple Metal
gabe-l-hart
force pushed
from
a89b966f
to
00abc0df
121 days ago
gabe-l-hart
force pushed
from
00abc0df
to
ccd0d402
116 days ago
gabe-l-hart
force pushed
from
ccd0d402
to
c6225f32
113 days ago
gabe-l-hart
force pushed
from
c6225f32
to
57d52a19
109 days ago
gabe-l-hart
force pushed
from
57d52a19
to
63e099bb
108 days ago
github-actions
added
examples
github-actions
added
server
gabe-l-hart
force pushed
from
63e099bb
to
311bccc0
108 days ago
gabe-l-hart
force pushed
from
311bccc0
to
857fa82a
107 days ago
gabe-l-hart
force pushed
from
857fa82a
to
2b20a09d
106 days ago
gabe-l-hart
force pushed
from
2b20a09d
to
58085c7e
101 days ago
gabe-l-hart
force pushed
from
58085c7e
to
3e8faff7
100 days ago
gabe-l-hart
force pushed
from
9752355a
to
6b8d79bc
96 days ago
Merge branch 'master' into compilade/mamba2
9864bfcd
graph : fix recurrent state copies when avoiding copies
2fa5f2ce
ggml : fix mamba2 ssm scan when compiled with SVE
757aa623
gabe-l-hart
force pushed
from
6b8d79bc
to
8d2c8db3
89 days ago
gabe-l-hart
force pushed
from
8d2c8db3
to
2bfd7563
89 days ago
gabe-l-hart
force pushed
from
2bfd7563
to
71681e4f
87 days ago
Merge branch 'master' into compilade/mamba2
a42f2394
cuda : implement ssm scan for Mamba2
f8c7caee
Merge branch 'master' into compilade/mamba2
830e5542
gabe-l-hart
force pushed
from
71681e4f
to
faba0c3b
86 days ago
github-actions
added
Nvidia GPU
Merge branch 'master' into compilade/mamba2
afdb6692
feat: Add conversion for Bamba models
28881af1
feat: Add Granite 4 conversion
c43259bd
feat: Plumb bamba through llama-arch
26816fd6
feat: Add bamba to llama_arch_is_hybrid_recurrent
b901947a
feat: Add optional mamba ssm_in bias tensor
fc56325a
feat: Add template specialization for get_arr to load a vector<uint32…
b3453dc9
feat: Use an explicit bool to determine mamaba vs mamba2
13e8d3df
feat: Isolate mamba(2) and granite attention layer building in static…
b435dce2
fix: Use per-layer sizes in granite build_attention_layer
3d4c36b5
feat: First (broken) pass at end-to-end Bamba implementation
0d28bf61
fix: Only do Granite multipliers if set
ed6216a7
refactor: Pull granite ffn portion into a static function and reuse i…
a6f9f90d
feat(py): Allow gguf duplicate keys if they match by value and type
de4d8701
refactor(py): Simplify granitemoehybrid conversion to use parents better
7c2b0b80
feat: Add GRANITE_MOE_HYBRID through llama-arch
915f1e3f
feat: Support GRANITE_MOE_HYBRID in llama-model
d0d3723a
style: Fix flake8 errors
2ca34162
fix: Fix recurrent cache get after rebase
3c22e1de
fix: Fix hybrid granite implementation for signature changes in build…
08493bff
refactor: Refactor relationship between non-hybrid classes and hybrid…
ed150125
refactor: Implement the full copy-paste version to duplicate the laye…
40e23469
refactor: Rename llm_build_hybrid_mamba -> llm_build_granite_hybrid
a9dcc845
gabe-l-hart
force pushed
from
443e7e74
to
a9dcc845
79 days ago
mamba : fix mismatched new and delete size for llm_build_mamba
dc1d109d
Merge remote-tracking branch 'origin/compilade/mamba2' into mamba2-sync
fdc9a8da
Merge branch 'mamba2-sync' into GraniteFour
2b263e63
memory : correctly handle failure in apply()
66a7a432
Merge remote-tracking branch 'origin/master' into GraniteFour
8cb4df55
Merge remote-tracking branch 'origin/gg/memory-is-fail' into GraniteFour
f13f5bcd
gabe-l-hart
force pushed
from
7613fb2f
to
f13f5bcd
75 days ago
Merge remote-tracking branch 'origin/master' into GraniteFour
6cac5868
Merge remote-tracking branch 'origin/master' into GraniteFour
28361c40
Merge remote-tracking branch 'origin/master' into GraniteFour
bb2bb372
gabe-l-hart
marked this pull request as ready for review
73 days ago
gabe-l-hart
commented on 2025-07-02
gabe-l-hart
commented on 2025-07-02
gabe-l-hart
commented on 2025-07-02
gabe-l-hart
commented on 2025-07-02
gabe-l-hart
commented on 2025-07-02
style: Remove TODO for adding first hybrid models to the switch
8f9b5130
fix: Fix bad merge in tensor_mapping.py w/ SSM_NORM
eaec9c68
gabe-l-hart
commented on 2025-07-02
gabe-l-hart
commented on 2025-07-02
gabe-l-hart
commented on 2025-07-02
fix: Fix bad merge resolution with variable renames/moves in llm_buil…
1085cf9c
docs: Fix comment about duplicate key check
b6d772f9
ggerganov
commented on 2025-07-02
ggerganov
commented on 2025-07-02
fix: Conform to standard way of initializing inp_out_ids
bb590f2e
Merge remote-tracking branch 'origin/master' into GraniteFour
1c21a043
Merge branch 'master' into compilade/refactor-kv-cache
2bcaf64e
convert : fix jamba conv1d shape squeezing
908e6559
Merge remote-tracking branch 'origin/master' into GraniteFour
d7f4d737
Merge remote-tracking branch 'origin/compilade/refactor-kv-cache' int…
e1001534
fix: Fix input initialization in granite_hybrid after removal of hybr…
4b5f6735
fix: Use llm_graph_context_mamba in llm_build_granite_hybrid
0796726b
refactor: Refactor mamba2/granite/jamba/granite_hybrid relationships …
f7fa1b15
gabe-l-hart
marked this pull request as draft
72 days ago
Merge branch 'master' into compilade/refactor-kv-cache
4682e21c
graph : add back hybrid memory graph input
20f8e43e
model : add Jamba to Mamba-specific hparams printing
07c252f0
Merge remote-tracking branch 'origin/compilade/refactor-kv-cache' int…
2e1431fe
fix: Fix input setup after upstream merge
5c32e80d
Merge remote-tracking branch 'origin/master' into GraniteFour
f9d6dd1e
Merge branch 'master' into compilade/refactor-kv-cache
f7163582
Merge remote-tracking branch 'origin/compilade/refactor-kv-cache' int…
65f3d9e9
Merge remote-tracking branch 'origin/master' into GraniteFour
257d436d
Merge branch 'master' into compilade/refactor-kv-cache
b0b280ea
jamba : remove redundant nullptr initializations
db5ff0cc
model : remove unnecessary prefix for tensor loading constants
2f39cd7b
model : use ggml_swiglu_split for Mamba
f7c7a926
CISC
requested changes on 2025-07-08
feat: Add support for dense FFN in GraniteMoeHybrid
8a1ea3ef
feat: Add support for dense FFN tensor names on c++ side
12c50f13
Merge remote-tracking branch 'origin/compilade/refactor-kv-cache' int…
0b84bd52
Merge branch 'master' into compilade/refactor-kv-cache
a60a24be
Merge remote-tracking branch 'origin/master' into GraniteFour
1334c71b
fix: Use child inputs for Falcon H1 after merge resolution
f8b81c0e
fix: Remove unnecessary prefix on tensor constants
0583d952
gabe-l-hart
force pushed
from
4b93a2a0
to
0583d952
66 days ago
model : make falcon-h1 use shared mamba2 layer builder
7f3955a0
Merge remote-tracking branch 'origin/compilade/refactor-kv-cache' int…
fa159cf4
fix: Revert order changes for Falcon H1 to stay consistent with upstream
44cda757
gguf-py : avoid adding duplicate tensor mappings for Jamba
4d6a179c
Merge remote-tracking branch 'origin/compilade/refactor-kv-cache' int…
d1d54d87
Merge remote-tracking branch 'origin/master' into GraniteFour
fe34d0e0
gabe-l-hart
marked this pull request as ready for review
66 days ago
gabe-l-hart
commented on 2025-07-09
refactor: Collapse Bamba and GraniteMoeHybrid into GraniteHybrid
68756970
refactor: Remove use of diamond inheritance
8dd7f977
Merge remote-tracking branch 'origin/master' into GraniteFour
2b36420a
compilade
commented on 2025-07-09
feat: Log mamba params for Granite Hybrid
dcf51e08
fix: Remove unused ssm_in_b
5b44f4e7
refactor: Remove ATTENTION_LAYER_INDICES hparam in favor of n_head_kv
4e9fef1a
ggerganov
approved these changes on 2025-07-10
ggerganov
requested a review
from
compilade
65 days ago
ggerganov
requested a review
from
CISC
65 days ago
CISC
approved these changes on 2025-07-10
fix: Remove unused template expansion for get_arr
d02d3ddb
Merge remote-tracking branch 'origin/master' into GraniteFour
afc17386
Merge remote-tracking branch 'origin/master' into GraniteFour
d7d5b01a
compilade
commented on 2025-07-10
fix: Review cleanup in convert_hf_to_gguf
f43a8dc5
compilade
commented on 2025-07-10
fix: Undo hidden warnings about duplicate identical keys in add_key_v…
63f1ed83
fix: If not using ROPE, context is "infinite"
f1485d2a
doc: Add a comment outlining expected duplicate key warnings
04883fc7
compilade
approved these changes on 2025-07-10
fix: Remove unnecessary duplicate keys in converter
e53632b6
CISC
merged
0aedae00
into master
64 days ago
ggerganov
added
hot
gabe-l-hart
deleted the GraniteFour branch
64 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
compilade
CISC
ggerganov
Assignees
No one assigned
Labels
testing
Nvidia GPU
examples
python
server
ggml
Apple Metal
hot
Milestone
No milestone
Login to write a write a comment.
Login via GitHub