PR #13979 Hybrid recurrent cache

ggerganov commented on 2025-06-03

compilade commented on 2025-06-02

gabe-l-hart force pushed from fe814bff to f3bf43d9 155 days ago

gabe-l-hart force pushed from f3bf43d9 to 50b8ad48 155 days ago

gabe-l-hart force pushed from 50b8ad48 to 0893b4c8 154 days ago

gabe-l-hart force pushed from 1990f3b5 to 85d2917f 153 days ago

gabe-l-hart force pushed from 8a7e8ef2 to 6cf35ffa 152 days ago

gabe-l-hart force pushed from 6cf35ffa to ab918bb5 152 days ago

gabe-l-hart force pushed from ab918bb5 to 60ca3baf 149 days ago

gabe-l-hart requested a review from

ngxson 149 days ago

github-actions added examples

github-actions added python

github-actions added server

gabe-l-hart force pushed from 39a93b39 to 60ca3baf 149 days ago

gabe-l-hart force pushed from 60ca3baf to 7958d844 148 days ago

gabe-l-hart force pushed from 7958d844 to 36698767 148 days ago

gabe-l-hart marked this pull request as draft 148 days ago

gabe-l-hart force pushed from 36698767 to 8c59841c 148 days ago

gabe-l-hart marked this pull request as ready for review 148 days ago

gabe-l-hart commented on 2025-06-10

gabe-l-hart force pushed from bb87dbf6 to b216ed37 147 days ago

younesbelkada commented on 2025-06-11

gabe-l-hart force pushed from b216ed37 to 1309384b 147 days ago

ggerganov commented on 2025-06-11

ggerganov commented on 2025-06-12

gabe-l-hart force pushed from 1a7e23dd to b3c948a5 146 days ago

gabe-l-hart commented on 2025-06-13

compilade commented on 2025-06-14

gabe-l-hart force pushed from 6253c7c8 to ff2faeed 142 days ago

gabe-l-hart force pushed from 0b104208 to 95b66986 142 days ago

feat: Add llama_model_is_hybrid API call

ec8fe17b

feat: Add c++ side constants for attention layer indices hparam

5e2f2c38

feat: Add support for distinguishing recurrent vs non-recurrent layer…

05f19580

feat: Auto-fill hparams.recurrent_layer_arr based on whether the mode…

fc9e0b57

refactor: rename *_is_hybrid -> *_is_hybrid_recurrent

fb26e95a

feat: Add layer filter to recurrent cache

40e91878

fix: Use per-layer sizing everywhere in kv caches

13332a75

feat: First pass at llama_kv_cache_hybrid_recurrent

c71eaa37

feat: Construct hybrid recurrent cache for hybrid recurrent models

423c8940

fix: Fix wrong bool condition for split equal in hybrid cache

6c6ec000

fix: Fix shift logic to defer to unified cache

cf03d4ae

feat: Support hybrid recurrent in llama-graph

e3c16315

fix: Fix logic for initializing inputs and attn layers for hybrid caches

a9b5fe98

fix: Update recurrent cache for changes to remove intermediate kv_cac…

d3699366

fix: Fix status for init_update sig for recurrent cache state

911e6944

fix: Add missing padding to n_ctx for hybrid cache construction

de9297fd

fix: Update clear signature for data argument after rebase

9c1a604a

fix: Remove errant virtual destructor leftover from previous impl att…

f6d5f055

fix: Use per-layer n_embd_k/v_s calls for mamba (1) layers

833dfb54

refactor: Remove n_embd_k/v_s from unified cache

1dd12133

refactor: Remove layer index from n_embd_k/v_s

b42c8b43

refactor: Remove n_embd_k/v_gqa from recurrent cache

d5d7628b

feat: Allow custom layer filters for hybrid recurrent

d8c929ff

fix: Remove logits_all after rebase

1510016e

fix: Remove llama_model_is_hybrid_Recurrent public API

7ba463b3

refactor: Use llama_memory_state_ptr for child states in hybrid memor…

4ec4e6a8

feat: Overhaul build_recurrent_state / build_inp_s_copy to match atte…

11cd80d5

fix: Fix resize vs reserve and skip null tensors in size computation

9db44a2a

fix: Fix initialization of child states

5046d412

refactor: Use a common build_recurrent_state method that is cache-agn…

faf41199

gabe-l-hart force pushed from 95b66986 to faf41199 141 days ago

recurrent : rework graph inputs + add TODOs

59fee24c

Merge pull request #2 from ggml-org/gabe-l-hart/HybridRecurrentCache

c80e68ca

refactor: Make status and child states const in hybrid and iswa

8488f5e3

refactor: Rename llama_kv_cache_[recurrent|hybrid_recurrent] to remov…

88213a95

refactor!: Rename all k/v related values for recurrent/hybrid to r/s

8e39e04b

ggerganov approved these changes on 2025-06-18

refacor: _recurrent -> _recr for brevity

6403f192

style: Fix spacing for ref

d0565e88

refactor: recurrent_layer() -> is_recurrent()

35c02336

ggerganov commented on 2025-06-18

style: Fix spacing for size_s_bytes declaration

304f86e6

ggerganov merged edc4a29e into master 140 days ago

compilade commented on 2025-06-19

gabe-l-hart deleted the HybridRecurrentCache branch 125 days ago

llama.cpp
Hybrid recurrent cache
#13979

Merged

Hybrid recurrent cache #13979

llama.cpp Hybrid recurrent cache #13979 Merged

Hybrid recurrent cache #13979

llama.cpp
Hybrid recurrent cache
#13979

Merged