Hybrid recurrent cache #13979
gabe-l-hart
force pushed
from
fe814bff
to
f3bf43d9
129 days ago
gabe-l-hart
force pushed
from
f3bf43d9
to
50b8ad48
128 days ago
gabe-l-hart
force pushed
from
50b8ad48
to
0893b4c8
128 days ago
gabe-l-hart
force pushed
from
1990f3b5
to
85d2917f
126 days ago
gabe-l-hart
force pushed
from
8a7e8ef2
to
6cf35ffa
126 days ago
gabe-l-hart
force pushed
from
6cf35ffa
to
ab918bb5
126 days ago
gabe-l-hart
force pushed
from
ab918bb5
to
60ca3baf
123 days ago
gabe-l-hart
force pushed
from
39a93b39
to
60ca3baf
122 days ago
gabe-l-hart
force pushed
from
60ca3baf
to
7958d844
121 days ago
gabe-l-hart
force pushed
from
7958d844
to
36698767
121 days ago
gabe-l-hart
marked this pull request as draft 121 days ago
gabe-l-hart
force pushed
from
36698767
to
8c59841c
121 days ago
gabe-l-hart
marked this pull request as ready for review 121 days ago
gabe-l-hart
force pushed
from
bb87dbf6
to
b216ed37
121 days ago
gabe-l-hart
force pushed
from
b216ed37
to
1309384b
121 days ago
gabe-l-hart
force pushed
from
1a7e23dd
to
b3c948a5
119 days ago
gabe-l-hart
force pushed
from
6253c7c8
to
ff2faeed
115 days ago
gabe-l-hart
force pushed
from
0b104208
to
95b66986
115 days ago
feat: Add llama_model_is_hybrid API call
ec8fe17b
feat: Add c++ side constants for attention layer indices hparam
5e2f2c38
feat: Add support for distinguishing recurrent vs non-recurrent layer…
05f19580
feat: Auto-fill hparams.recurrent_layer_arr based on whether the mode…
fc9e0b57
refactor: rename *_is_hybrid -> *_is_hybrid_recurrent
fb26e95a
feat: Add layer filter to recurrent cache
40e91878
fix: Use per-layer sizing everywhere in kv caches
13332a75
feat: First pass at llama_kv_cache_hybrid_recurrent
c71eaa37
feat: Construct hybrid recurrent cache for hybrid recurrent models
423c8940
fix: Fix wrong bool condition for split equal in hybrid cache
6c6ec000
fix: Fix shift logic to defer to unified cache
cf03d4ae
feat: Support hybrid recurrent in llama-graph
e3c16315
fix: Fix logic for initializing inputs and attn layers for hybrid caches
a9b5fe98
fix: Update recurrent cache for changes to remove intermediate kv_cac…
d3699366
fix: Fix status for init_update sig for recurrent cache state
911e6944
fix: Add missing padding to n_ctx for hybrid cache construction
de9297fd
fix: Update clear signature for data argument after rebase
9c1a604a
fix: Remove errant virtual destructor leftover from previous impl att…
f6d5f055
fix: Use per-layer n_embd_k/v_s calls for mamba (1) layers
833dfb54
refactor: Remove n_embd_k/v_s from unified cache
1dd12133
refactor: Remove layer index from n_embd_k/v_s
b42c8b43
refactor: Remove n_embd_k/v_gqa from recurrent cache
d5d7628b
feat: Allow custom layer filters for hybrid recurrent
d8c929ff
fix: Remove logits_all after rebase
1510016e
fix: Remove llama_model_is_hybrid_Recurrent public API
7ba463b3
refactor: Use llama_memory_state_ptr for child states in hybrid memor…
4ec4e6a8
feat: Overhaul build_recurrent_state / build_inp_s_copy to match atte…
11cd80d5
fix: Fix resize vs reserve and skip null tensors in size computation
9db44a2a
fix: Fix initialization of child states
5046d412
refactor: Use a common build_recurrent_state method that is cache-agn…
faf41199
gabe-l-hart
force pushed
from
95b66986
to
faf41199
114 days ago
recurrent : rework graph inputs + add TODOs
59fee24c
Merge pull request #2 from ggml-org/gabe-l-hart/HybridRecurrentCache
c80e68ca
refactor: Make status and child states const in hybrid and iswa
8488f5e3
refactor: Rename llama_kv_cache_[recurrent|hybrid_recurrent] to remov…
88213a95
refactor!: Rename all k/v related values for recurrent/hybrid to r/s
8e39e04b
ggerganov
approved these changes
on 2025-06-18
refacor: _recurrent -> _recr for brevity
6403f192
style: Fix spacing for ref
d0565e88
refactor: recurrent_layer() -> is_recurrent()
35c02336
style: Fix spacing for size_s_bytes declaration
304f86e6
ggerganov
merged
edc4a29e
into master 113 days ago
gabe-l-hart
deleted the HybridRecurrentCache branch 99 days ago
Assignees
No one assigned
Labels
examples
python
server
Login to write a write a comment.
Login via GitHub