llama.cpp
Hybrid recurrent cache
#13979
Merged

Hybrid recurrent cache #13979

gabe-l-hart
ggerganov
ggerganov commented on 2025-06-03
compilade
compilade commented on 2025-06-02
gabe-l-hart gabe-l-hart force pushed from fe814bff to f3bf43d9 129 days ago
gabe-l-hart gabe-l-hart force pushed from f3bf43d9 to 50b8ad48 128 days ago
gabe-l-hart gabe-l-hart force pushed from 50b8ad48 to 0893b4c8 128 days ago
gabe-l-hart gabe-l-hart force pushed from 1990f3b5 to 85d2917f 126 days ago
gabe-l-hart gabe-l-hart force pushed from 8a7e8ef2 to 6cf35ffa 126 days ago
gabe-l-hart gabe-l-hart force pushed from 6cf35ffa to ab918bb5 126 days ago
gabe-l-hart gabe-l-hart force pushed from ab918bb5 to 60ca3baf 123 days ago
gabe-l-hart
gabe-l-hart gabe-l-hart requested a review from ngxson ngxson 123 days ago
github-actions github-actions added examples
github-actions github-actions added python
github-actions github-actions added server
gabe-l-hart
ggerganov
gabe-l-hart gabe-l-hart force pushed from 39a93b39 to 60ca3baf 122 days ago
gabe-l-hart
gabe-l-hart
gabe-l-hart gabe-l-hart force pushed from 60ca3baf to 7958d844 121 days ago
gabe-l-hart gabe-l-hart force pushed from 7958d844 to 36698767 121 days ago
gabe-l-hart gabe-l-hart marked this pull request as draft 121 days ago
gabe-l-hart
gabe-l-hart gabe-l-hart force pushed from 36698767 to 8c59841c 121 days ago
gabe-l-hart gabe-l-hart marked this pull request as ready for review 121 days ago
gabe-l-hart
gabe-l-hart
gabe-l-hart commented on 2025-06-10
gabe-l-hart
gabe-l-hart commented on 2025-06-10
gabe-l-hart
gabe-l-hart commented on 2025-06-10
gabe-l-hart
gabe-l-hart commented on 2025-06-10
gabe-l-hart
gabe-l-hart commented on 2025-06-10
gabe-l-hart
gabe-l-hart commented on 2025-06-10
gabe-l-hart
gabe-l-hart commented on 2025-06-10
gabe-l-hart
gabe-l-hart commented on 2025-06-10
gabe-l-hart
gabe-l-hart commented on 2025-06-10
gabe-l-hart
gabe-l-hart commented on 2025-06-10
gabe-l-hart
gabe-l-hart commented on 2025-06-10
gabe-l-hart
gabe-l-hart gabe-l-hart force pushed from bb87dbf6 to b216ed37 121 days ago
younesbelkada
younesbelkada commented on 2025-06-11
gabe-l-hart gabe-l-hart force pushed from b216ed37 to 1309384b 121 days ago
ggerganov
ggerganov commented on 2025-06-11
ggerganov
ggerganov commented on 2025-06-11
ggerganov
ggerganov commented on 2025-06-12
gabe-l-hart gabe-l-hart force pushed from 1a7e23dd to b3c948a5 119 days ago
gabe-l-hart
gabe-l-hart commented on 2025-06-13
compilade
compilade commented on 2025-06-14
gabe-l-hart gabe-l-hart force pushed from 6253c7c8 to ff2faeed 115 days ago
gabe-l-hart gabe-l-hart force pushed from 0b104208 to 95b66986 115 days ago
gabe-l-hart
gabe-l-hart feat: Add llama_model_is_hybrid API call
ec8fe17b
gabe-l-hart feat: Add c++ side constants for attention layer indices hparam
5e2f2c38
gabe-l-hart feat: Add support for distinguishing recurrent vs non-recurrent layer…
05f19580
gabe-l-hart feat: Auto-fill hparams.recurrent_layer_arr based on whether the mode…
fc9e0b57
gabe-l-hart refactor: rename *_is_hybrid -> *_is_hybrid_recurrent
fb26e95a
gabe-l-hart feat: Add layer filter to recurrent cache
40e91878
gabe-l-hart fix: Use per-layer sizing everywhere in kv caches
13332a75
gabe-l-hart feat: First pass at llama_kv_cache_hybrid_recurrent
c71eaa37
gabe-l-hart feat: Construct hybrid recurrent cache for hybrid recurrent models
423c8940
gabe-l-hart fix: Fix wrong bool condition for split equal in hybrid cache
6c6ec000
gabe-l-hart fix: Fix shift logic to defer to unified cache
cf03d4ae
gabe-l-hart feat: Support hybrid recurrent in llama-graph
e3c16315
gabe-l-hart fix: Fix logic for initializing inputs and attn layers for hybrid caches
a9b5fe98
gabe-l-hart fix: Update recurrent cache for changes to remove intermediate kv_cac…
d3699366
gabe-l-hart fix: Fix status for init_update sig for recurrent cache state
911e6944
gabe-l-hart fix: Add missing padding to n_ctx for hybrid cache construction
de9297fd
gabe-l-hart fix: Update clear signature for data argument after rebase
9c1a604a
gabe-l-hart fix: Remove errant virtual destructor leftover from previous impl att…
f6d5f055
gabe-l-hart fix: Use per-layer n_embd_k/v_s calls for mamba (1) layers
833dfb54
gabe-l-hart refactor: Remove n_embd_k/v_s from unified cache
1dd12133
gabe-l-hart refactor: Remove layer index from n_embd_k/v_s
b42c8b43
gabe-l-hart refactor: Remove n_embd_k/v_gqa from recurrent cache
d5d7628b
gabe-l-hart feat: Allow custom layer filters for hybrid recurrent
d8c929ff
gabe-l-hart fix: Remove logits_all after rebase
1510016e
gabe-l-hart fix: Remove llama_model_is_hybrid_Recurrent public API
7ba463b3
gabe-l-hart refactor: Use llama_memory_state_ptr for child states in hybrid memor…
4ec4e6a8
gabe-l-hart feat: Overhaul build_recurrent_state / build_inp_s_copy to match atte…
11cd80d5
gabe-l-hart fix: Fix resize vs reserve and skip null tensors in size computation
9db44a2a
gabe-l-hart fix: Fix initialization of child states
5046d412
gabe-l-hart refactor: Use a common build_recurrent_state method that is cache-agn…
faf41199
gabe-l-hart gabe-l-hart force pushed from 95b66986 to faf41199 114 days ago
ggerganov recurrent : rework graph inputs + add TODOs
59fee24c
ggerganov
compilade
gabe-l-hart
gabe-l-hart Merge pull request #2 from ggml-org/gabe-l-hart/HybridRecurrentCache
c80e68ca
gabe-l-hart refactor: Make status and child states const in hybrid and iswa
8488f5e3
gabe-l-hart
gabe-l-hart
gabe-l-hart
gabe-l-hart refactor: Rename llama_kv_cache_[recurrent|hybrid_recurrent] to remov…
88213a95
gabe-l-hart
compilade
gabe-l-hart
ggerganov
gabe-l-hart
gabe-l-hart refactor!: Rename all k/v related values for recurrent/hybrid to r/s
8e39e04b
gabe-l-hart
ggerganov
ggerganov approved these changes on 2025-06-18
gabe-l-hart
gabe-l-hart refacor: _recurrent -> _recr for brevity
6403f192
gabe-l-hart style: Fix spacing for ref
d0565e88
gabe-l-hart refactor: recurrent_layer() -> is_recurrent()
35c02336
gabe-l-hart
gabe-l-hart
ggerganov
ggerganov commented on 2025-06-18
gabe-l-hart style: Fix spacing for size_s_bytes declaration
304f86e6
ggerganov ggerganov merged edc4a29e into master 113 days ago
compilade
compilade commented on 2025-06-19
gabe-l-hart gabe-l-hart deleted the HybridRecurrentCache branch 99 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone