Granite Four #13550

CISC merged 173 commits into ggml-org:master from gabe-l-hart:GraniteFour
gabe-l-hart
compilade wip: llama : separate recurrent states from the KV cache
271104c6
compilade llama : use std::find for seq_nodes in llama_rs_cache
8db1e4d4
compilade llama : state checkpoints for recurrent models
0028010d
compilade llama : correctly handle more edge cases for the rs cache
0c8b3b20
compilade Merge branch 'master' into compilade/refactor-kv-cache
d66849f6
compilade llama : rename many llama_kv_cache_* functions
a09db95e
compilade Merge branch 'master' into compilade/refactor-kv-cache
c460ff1a
compilade llama : remove useless return value for some llama_cache_* functions
b6fafd17
compilade Merge branch 'master' into compilade/refactor-kv-cache
b7ec12eb
compilade Merge branch 'master' into compilade/refactor-kv-cache
3b57b55c
compilade llama : rethink recurrent state cell counts
7e13f19f
compilade llama : support Jamba
cbc743e6
compilade Merge branch 'master' into compilade/refactor-kv-cache
0fd13e94
compilade llama : fix BERT inference without KV cache
61a88a1d
compilade convert-hf : check for unprocessed Jamba experts
ea2e63e9
compilade convert-hf : support Mini-Jamba conversion
fc59407e
compilade llama : fix Jamba quantization sanity checks
181dadf2
compilade llama : sequence-length-aware batch splitting
3a414b0b
compilade Merge branch 'master' into compilade/refactor-kv-cache
4e4c41e5
compilade llama : use equal-sequence-length sub-batches for recurrent models
3587a949
compilade Merge branch 'master' into compilade/refactor-kv-cache
5d3c7b95
compilade llama : fix batch split output count for embeddings
72eea492
compilade llama : minimize swaps when reordering logits
18d1c140
compilade llama : fix edge case finding batch seq_id of split recurrent cell
61200ef2
compilade llama : avoid copies for simple batch splits
eb589d5e
compilade llama : use im2col and mul_mat to perform convolution for Mamba
8fb57ac0
compilade llama : fix .base() compilation error on Windows
17f6c1ef
compilade llama : allow doing the equivalent of SSM_CONV with SUM_ROWS and MUL
fee3c1d7
compilade Merge branch 'master' into compilade/refactor-kv-cache
6840ac0b
compilade llama : rename llama_cache to llama_past
372482df
compilade examples : replace llama_kv_cache_seq_* with llama_past_seq_*
43d8d4bf
compilade Merge branch 'master' into compilade/refactor-kv-cache
ff794f55
compilade mamba : fix non-contiguous usage of ggml_silu
33425a7e
compilade Merge branch 'master' into compilade/refactor-kv-cache
10c3c419
compilade Merge branch 'master' into compilade/refactor-kv-cache
9b38f8bf
compilade llama : initial Mamba-2 support
1f0fea70
compilade ggml : SIMD ggml_ssm_scan for Mamba-2
dceff23f
compilade llama : support running Mamba-Codestral-7B-v0.1
2bfe9de6
compilade llama : fix Mamba-2 conv state saving
aff96920
compilade llama : remove unused variable
e04910dc
compilade llama : add missing break
fa358e70
compilade convert_hf : prefer SentencePiece tokenizer for Mamba-2 when present
38913dc8
compilade Merge branch 'master' into compilade/refactor-kv-cache
bc320ef6
compilade llama : session saving and reloading for hybrid models
fcb889cf
compilade Merge branch 'master' into compilade/refactor-kv-cache
a03e32a3
compilade convert_hf : fix Jamba conversion
9d3f44da
compilade llama : fix mixed signedness comparison
5f62db79
compilade llama : use unused n_embd_k_gqa in k_shift
375de5b1
compilade llama : begin renaming llama_past back to llama_kv_cache
4bb4b22a
compilade Merge branch 'master' into compilade/refactor-kv-cache
63ac36b2
compilade Merge branch 'master' into compilade/mamba2
0e601caf
compilade llama : avoid redundant state copy for Mamba 1 and 2
273e7a49
compilade Merge branch 'master' into compilade/mamba2
7d6cb368
compilade metal : attempt to adapt SSM_SCAN for Mamba-2
2c77d799
compilade metal : fix SSM_SCAN pipeline scope
87b97d08
compilade metal : use log and exp instead of log1pf and expf in SSM_SCAN
03d0e6ea
compilade metal : remove unused arguments for SSM_SCAN
7a351abc
compilade metal : add back n_seqs to SSM_SCAN args
8b15bc6f
compilade metal : fix SSM_SCAN state head offset
5b8ec2b9
compilade metal : fix wrong number of tokens per sequence in SSM_SCAN
62b09b34
compilade Merge branch 'master' into compilade/refactor-kv-cache
124c222f
compilade Merge branch 'master' into compilade/mamba2
038d9583
compilade ggml : remove unused fast broadcast path in GGML_MUL
805512a7
compilade Merge branch 'master' into compilade/mamba2
7d16e1bc
compilade ggml : avoid multiply by D in GGML_OP_SSM_SCAN
3bc7103d
compilade Merge branch 'master' into compilade/mamba2
8d8f0657
compilade convert : fix flake8 lint
b4e9c599
compilade llama : remove implicit recurrent state rollbacks
8006f3b3
compilade Merge branch 'master' into compilade/refactor-kv-cache
691698e1
compilade llama : partially apply clang-format style
e3fe6120
compilade Merge branch 'master' into compilade/mamba2
1ee6c482
compilade Merge branch 'master' into compilade/mamba2
c9ecf620
compilade Merge branch 'master' into compilade/mamba2
35d06fac
compilade metal : fix confusion between ; and ,
cf4f0a41
compilade metal : add missing args for nb references in ssm_scan_f32_group
6def5cd7
compilade metal : single-user mamba2 inference works
791998b4
compilade kv-cache : remove const_cast when setting inputs for s_copy
94c3d530
compilade Merge branch 'master' into compilade/mamba2
929fe85d
compilade convert : avoid AutoConfig for Mamba and Mamba2 hparams
d55b0d06
compilade kv-cache : allow context shift for recurrent models
e94f3932
github-actions github-actions added testing
github-actions github-actions added python
github-actions github-actions added ggml
github-actions github-actions added Apple Metal
gabe-l-hart gabe-l-hart force pushed from a89b966f to 00abc0df 121 days ago
gabe-l-hart gabe-l-hart force pushed from 00abc0df to ccd0d402 116 days ago
gabe-l-hart gabe-l-hart force pushed from ccd0d402 to c6225f32 113 days ago
gabe-l-hart gabe-l-hart force pushed from c6225f32 to 57d52a19 109 days ago
gabe-l-hart gabe-l-hart force pushed from 57d52a19 to 63e099bb 108 days ago
github-actions github-actions added examples
github-actions github-actions added server
gabe-l-hart gabe-l-hart force pushed from 63e099bb to 311bccc0 108 days ago
gabe-l-hart gabe-l-hart force pushed from 311bccc0 to 857fa82a 107 days ago
gabe-l-hart gabe-l-hart force pushed from 857fa82a to 2b20a09d 106 days ago
gabe-l-hart gabe-l-hart force pushed from 2b20a09d to 58085c7e 101 days ago
gabe-l-hart gabe-l-hart force pushed from 58085c7e to 3e8faff7 100 days ago
gabe-l-hart gabe-l-hart force pushed from 9752355a to 6b8d79bc 96 days ago
compilade Merge branch 'master' into compilade/mamba2
9864bfcd
compilade graph : fix recurrent state copies when avoiding copies
2fa5f2ce
compilade ggml : fix mamba2 ssm scan when compiled with SVE
757aa623
gabe-l-hart gabe-l-hart force pushed from 6b8d79bc to 8d2c8db3 89 days ago
gabe-l-hart gabe-l-hart force pushed from 8d2c8db3 to 2bfd7563 89 days ago
gabe-l-hart gabe-l-hart force pushed from 2bfd7563 to 71681e4f 87 days ago
compilade Merge branch 'master' into compilade/mamba2
a42f2394
compilade cuda : implement ssm scan for Mamba2
f8c7caee
compilade Merge branch 'master' into compilade/mamba2
830e5542
gabe-l-hart gabe-l-hart force pushed from 71681e4f to faba0c3b 86 days ago
github-actions github-actions added Nvidia GPU
compilade Merge branch 'master' into compilade/mamba2
afdb6692
gabe-l-hart feat: Add conversion for Bamba models
28881af1
gabe-l-hart feat: Add Granite 4 conversion
c43259bd
gabe-l-hart feat: Plumb bamba through llama-arch
26816fd6
gabe-l-hart feat: Add bamba to llama_arch_is_hybrid_recurrent
b901947a
gabe-l-hart feat: Add optional mamba ssm_in bias tensor
fc56325a
gabe-l-hart feat: Add template specialization for get_arr to load a vector<uint32…
b3453dc9
gabe-l-hart feat: Use an explicit bool to determine mamaba vs mamba2
13e8d3df
gabe-l-hart feat: Isolate mamba(2) and granite attention layer building in static…
b435dce2
gabe-l-hart fix: Use per-layer sizes in granite build_attention_layer
3d4c36b5
gabe-l-hart feat: First (broken) pass at end-to-end Bamba implementation
0d28bf61
gabe-l-hart fix: Only do Granite multipliers if set
ed6216a7
gabe-l-hart refactor: Pull granite ffn portion into a static function and reuse i…
a6f9f90d
gabe-l-hart feat(py): Allow gguf duplicate keys if they match by value and type
de4d8701
gabe-l-hart refactor(py): Simplify granitemoehybrid conversion to use parents better
7c2b0b80
gabe-l-hart feat: Add GRANITE_MOE_HYBRID through llama-arch
915f1e3f
gabe-l-hart feat: Support GRANITE_MOE_HYBRID in llama-model
d0d3723a
gabe-l-hart style: Fix flake8 errors
2ca34162
gabe-l-hart fix: Fix recurrent cache get after rebase
3c22e1de
gabe-l-hart fix: Fix hybrid granite implementation for signature changes in build…
08493bff
gabe-l-hart refactor: Refactor relationship between non-hybrid classes and hybrid…
ed150125
gabe-l-hart refactor: Implement the full copy-paste version to duplicate the laye…
40e23469
gabe-l-hart refactor: Rename llm_build_hybrid_mamba -> llm_build_granite_hybrid
a9dcc845
gabe-l-hart gabe-l-hart force pushed from 443e7e74 to a9dcc845 79 days ago
compilade mamba : fix mismatched new and delete size for llm_build_mamba
dc1d109d
gabe-l-hart Merge remote-tracking branch 'origin/compilade/mamba2' into mamba2-sync
fdc9a8da
gabe-l-hart Merge branch 'mamba2-sync' into GraniteFour
2b263e63
gabe-l-hart
gabe-l-hart
gabe-l-hart
compilade
gabe-l-hart
gabe-l-hart
gabe-l-hart
gabe-l-hart
ggerganov memory : correctly handle failure in apply()
66a7a432
gabe-l-hart Merge remote-tracking branch 'origin/master' into GraniteFour
8cb4df55
gabe-l-hart Merge remote-tracking branch 'origin/gg/memory-is-fail' into GraniteFour
f13f5bcd
gabe-l-hart gabe-l-hart force pushed from 7613fb2f to f13f5bcd 75 days ago
gabe-l-hart Merge remote-tracking branch 'origin/master' into GraniteFour
6cac5868
gabe-l-hart Merge remote-tracking branch 'origin/master' into GraniteFour
28361c40
gabe-l-hart Merge remote-tracking branch 'origin/master' into GraniteFour
bb2bb372
gabe-l-hart gabe-l-hart marked this pull request as ready for review 73 days ago
gabe-l-hart
gabe-l-hart
gabe-l-hart commented on 2025-07-02
gabe-l-hart
gabe-l-hart commented on 2025-07-02
gabe-l-hart
gabe-l-hart commented on 2025-07-02
gabe-l-hart
gabe-l-hart commented on 2025-07-02
gabe-l-hart
gabe-l-hart commented on 2025-07-02
gabe-l-hart style: Remove TODO for adding first hybrid models to the switch
8f9b5130
gabe-l-hart fix: Fix bad merge in tensor_mapping.py w/ SSM_NORM
eaec9c68
gabe-l-hart
gabe-l-hart commented on 2025-07-02
gabe-l-hart
gabe-l-hart commented on 2025-07-02
gabe-l-hart
gabe-l-hart commented on 2025-07-02
gabe-l-hart fix: Fix bad merge resolution with variable renames/moves in llm_buil…
1085cf9c
gabe-l-hart docs: Fix comment about duplicate key check
b6d772f9
ggerganov
ggerganov commented on 2025-07-02
ggerganov
ggerganov commented on 2025-07-02
gabe-l-hart fix: Conform to standard way of initializing inp_out_ids
bb590f2e
gabe-l-hart Merge remote-tracking branch 'origin/master' into GraniteFour
1c21a043
compilade Merge branch 'master' into compilade/refactor-kv-cache
2bcaf64e
compilade convert : fix jamba conv1d shape squeezing
908e6559
gabe-l-hart Merge remote-tracking branch 'origin/master' into GraniteFour
d7f4d737
gabe-l-hart Merge remote-tracking branch 'origin/compilade/refactor-kv-cache' int…
e1001534
gabe-l-hart fix: Fix input initialization in granite_hybrid after removal of hybr…
4b5f6735
gabe-l-hart fix: Use llm_graph_context_mamba in llm_build_granite_hybrid
0796726b
gabe-l-hart refactor: Refactor mamba2/granite/jamba/granite_hybrid relationships …
f7fa1b15
gabe-l-hart
gabe-l-hart gabe-l-hart marked this pull request as draft 72 days ago
compilade Merge branch 'master' into compilade/refactor-kv-cache
4682e21c
compilade graph : add back hybrid memory graph input
20f8e43e
compilade model : add Jamba to Mamba-specific hparams printing
07c252f0
gabe-l-hart Merge remote-tracking branch 'origin/compilade/refactor-kv-cache' int…
2e1431fe
gabe-l-hart fix: Fix input setup after upstream merge
5c32e80d
gabe-l-hart Merge remote-tracking branch 'origin/master' into GraniteFour
f9d6dd1e
compilade Merge branch 'master' into compilade/refactor-kv-cache
f7163582
gabe-l-hart Merge remote-tracking branch 'origin/compilade/refactor-kv-cache' int…
65f3d9e9
gabe-l-hart Merge remote-tracking branch 'origin/master' into GraniteFour
257d436d
compilade Merge branch 'master' into compilade/refactor-kv-cache
b0b280ea
compilade jamba : remove redundant nullptr initializations
db5ff0cc
compilade model : remove unnecessary prefix for tensor loading constants
2f39cd7b
compilade model : use ggml_swiglu_split for Mamba
f7c7a926
CISC
CISC requested changes on 2025-07-08
gabe-l-hart feat: Add support for dense FFN in GraniteMoeHybrid
8a1ea3ef
gabe-l-hart feat: Add support for dense FFN tensor names on c++ side
12c50f13
gabe-l-hart Merge remote-tracking branch 'origin/compilade/refactor-kv-cache' int…
0b84bd52
compilade Merge branch 'master' into compilade/refactor-kv-cache
a60a24be
gabe-l-hart Merge remote-tracking branch 'origin/master' into GraniteFour
1334c71b
gabe-l-hart fix: Use child inputs for Falcon H1 after merge resolution
f8b81c0e
gabe-l-hart fix: Remove unnecessary prefix on tensor constants
0583d952
gabe-l-hart gabe-l-hart force pushed from 4b93a2a0 to 0583d952 66 days ago
compilade model : make falcon-h1 use shared mamba2 layer builder
7f3955a0
gabe-l-hart Merge remote-tracking branch 'origin/compilade/refactor-kv-cache' int…
fa159cf4
gabe-l-hart fix: Revert order changes for Falcon H1 to stay consistent with upstream
44cda757
compilade gguf-py : avoid adding duplicate tensor mappings for Jamba
4d6a179c
gabe-l-hart Merge remote-tracking branch 'origin/compilade/refactor-kv-cache' int…
d1d54d87
gabe-l-hart Merge remote-tracking branch 'origin/master' into GraniteFour
fe34d0e0
gabe-l-hart gabe-l-hart marked this pull request as ready for review 66 days ago
gabe-l-hart
gabe-l-hart
gabe-l-hart commented on 2025-07-09
ggerganov
gabe-l-hart
CISC
gabe-l-hart refactor: Collapse Bamba and GraniteMoeHybrid into GraniteHybrid
68756970
gabe-l-hart refactor: Remove use of diamond inheritance
8dd7f977
gabe-l-hart Merge remote-tracking branch 'origin/master' into GraniteFour
2b36420a
gabe-l-hart
compilade
compilade commented on 2025-07-09
gabe-l-hart feat: Log mamba params for Granite Hybrid
dcf51e08
gabe-l-hart fix: Remove unused ssm_in_b
5b44f4e7
gabe-l-hart refactor: Remove ATTENTION_LAYER_INDICES hparam in favor of n_head_kv
4e9fef1a
ggerganov
ggerganov approved these changes on 2025-07-10
ggerganov ggerganov requested a review from compilade compilade 65 days ago
ggerganov ggerganov requested a review from CISC CISC 65 days ago
CISC
CISC approved these changes on 2025-07-10
gabe-l-hart fix: Remove unused template expansion for get_arr
d02d3ddb
gabe-l-hart Merge remote-tracking branch 'origin/master' into GraniteFour
afc17386
gabe-l-hart Merge remote-tracking branch 'origin/master' into GraniteFour
d7d5b01a
CISC
compilade
compilade commented on 2025-07-10
gabe-l-hart fix: Review cleanup in convert_hf_to_gguf
f43a8dc5
gabe-l-hart
compilade
compilade commented on 2025-07-10
gabe-l-hart fix: Undo hidden warnings about duplicate identical keys in add_key_v…
63f1ed83
gabe-l-hart fix: If not using ROPE, context is "infinite"
f1485d2a
gabe-l-hart doc: Add a comment outlining expected duplicate key warnings
04883fc7
compilade
compilade approved these changes on 2025-07-10
gabe-l-hart fix: Remove unnecessary duplicate keys in converter
e53632b6
gabe-l-hart
CISC CISC merged 0aedae00 into master 64 days ago
gabe-l-hart
ggerganov ggerganov added hot
gabe-l-hart gabe-l-hart deleted the GraniteFour branch 64 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone