llama.cpp
kv-cache : separate recurrent vs non-recurrent impl
#12799
Merged

kv-cache : separate recurrent vs non-recurrent impl #12799

ggerganov merged 29 commits into master from gg/llama-kv-cache-v6
ggerganov
ggerganov ggerganov force pushed from 19eb81e0 246 days ago
ggerganov ggerganov force pushed to d953616e 244 days ago
ggerganov ggerganov force pushed from ed8942a3 to 2c3547e5 239 days ago
ggerganov ggerganov marked this pull request as ready for review 237 days ago
ggerganov ggerganov requested a review from slaren slaren 237 days ago
ggerganov ggerganov force pushed to d31e31da 237 days ago
ggerganov ggerganov force pushed to dec80ace 236 days ago
ggerganov ggerganov force pushed from dec80ace 233 days ago
ggerganov ggerganov force pushed to 65cde6d4 233 days ago
slaren
ggerganov
slaren
compilade
ggerganov ggerganov force pushed 231 days ago
ggerganov ggerganov force pushed to 7e4b5459 231 days ago
ggerganov
ggerganov ggerganov force pushed to eb623f2f 231 days ago
slaren
slaren commented on 2025-04-30
ggerganov
ggerganov commented on 2025-04-30
slaren
slaren commented on 2025-04-30
slaren
slaren approved these changes on 2025-04-30
compilade
compilade commented on 2025-05-01
ggerganov kv-cache : serparate recurrent vs non-recurrent impl (wip)
22bda486
ggerganov kv-cache : init -> contructor + add llama_memory_params
81457990
ggerganov kv-cache : fix callback reference
49aa8b83
ggerganov context : llama_kv_cache -> llama_memory_i
838b3cca
ggerganov context : move memory creation logic to model
8e4d3baa
ggerganov llama : remove reference of memory during encode
7fec0814
ggerganov kv-cache : hide padding details in the implementation
59af92bb
ggerganov kv-cache : add ubatch_next()
6413b937
ggerganov context : simplify sbatch logic
e869515b
ggerganov kv-cache : hide defrag logic in the implementation
ae2cd005
ggerganov context : hide kv cache details in implementation
fdb7206d
ggerganov build : fix
13d69a52
ggerganov cont : another fix
5ef7559a
ggerganov kv-cache : simplify interface (wip)
6b50ba75
ggerganov kv-cache : use separate KV cell structs for unified/recurrent
cb02ac80
ggerganov kv-cache : clean-up
f584750d
ggerganov model : better llama_model::create_model() signature
458f2a5f
ggerganov kv-cache : fix recurrent seq_rm()
92e626bd
ggerganov kv-cache : replace `struct callbacks` with `llama_model &`
43cbf38b
ggerganov kv-cache : replace `struct graph_params` with `llama_context &`
66198324
ggerganov kv-cache : fix offload check
95a9f8b5
ggerganov context : avoid passing unique_ptr
8737e655
ggerganov kv-cache : avoid using the backends from the llama_context
c9bddfc0
ggerganov kv-cache : more consistent debug logs [no ci]
09195eb2
ggerganov kv-cache : do not pass the full llama_context for kv graphs
58e1d40f
ggerganov kv-cache : remove comment
903e46f1
ggerganov ggerganov force pushed from 780d6fb1 229 days ago
ggerganov kv-cache : ggml_rope_ext_inplace -> ggml_rope_ext
00cde5fe
ggerganov kv-cache : fix recurrent multi-user case
7e79a427
ggerganov ggerganov force pushed to 7e79a427 229 days ago
ggerganov
ggerganov memory : remove comments [no ci]
5883c906
ggerganov
ggerganov ggerganov merged c642bc01 into master 229 days ago
ggerganov ggerganov deleted the gg/llama-kv-cache-v6 branch 229 days ago
compilade
compilade commented on 2025-05-02

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone