llama.cpp
c71eaa37 - feat: First pass at llama_kv_cache_hybrid_recurrent

Commit

332 days ago

feat: First pass at llama_kv_cache_hybrid_recurrent This follows the pattern in iswa where the two child caches are held explicitly to support the case where a model requires a single attention cache and a single recurrent cache where each layer uses exactly one of the caches. This is a rewrite of the more generic approach in the original hybrid cache PR: https://github.com/ggml-org/llama.cpp/pull/13276 Branch: HybridRecurrentCache Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

References

#13979 - Hybrid recurrent cache

Author

gabe-l-hart

Committer

gabe-l-hart

Parents

13332a75

llama.cpp c71eaa37 - feat: First pass at llama_kv_cache_hybrid_recurrent

llama.cpp
c71eaa37 - feat: First pass at llama_kv_cache_hybrid_recurrent