llama.cpp
kv-cache : separate recurrent vs non-recurrent impl
#12799

Merged

kv-cache : separate recurrent vs non-recurrent impl #12799

ggerganov merged 29 commits into master from gg/llama-kv-cache-v6

ggerganov force pushed from 19eb81e0 246 days ago

ggerganov force pushed to d953616e 244 days ago

ggerganov force pushed from ed8942a3 to 2c3547e5 239 days ago

ggerganov marked this pull request as ready for review 237 days ago

ggerganov requested a review from

slaren 237 days ago

ggerganov force pushed to d31e31da 237 days ago

ggerganov force pushed to dec80ace 236 days ago

ggerganov force pushed from dec80ace 233 days ago

ggerganov force pushed to 65cde6d4 233 days ago

ggerganov force pushed 231 days ago

ggerganov force pushed to 7e4b5459 231 days ago

ggerganov force pushed to eb623f2f 231 days ago

slaren commented on 2025-04-30

ggerganov commented on 2025-04-30

slaren commented on 2025-04-30

slaren approved these changes on 2025-04-30

compilade commented on 2025-05-01

kv-cache : serparate recurrent vs non-recurrent impl (wip)

22bda486

kv-cache : init -> contructor + add llama_memory_params

81457990

kv-cache : fix callback reference

49aa8b83

context : llama_kv_cache -> llama_memory_i

838b3cca

context : move memory creation logic to model

8e4d3baa

llama : remove reference of memory during encode

7fec0814

kv-cache : hide padding details in the implementation

59af92bb

kv-cache : add ubatch_next()

6413b937

context : simplify sbatch logic

e869515b

kv-cache : hide defrag logic in the implementation

ae2cd005

context : hide kv cache details in implementation

fdb7206d

build : fix

13d69a52

cont : another fix

5ef7559a

kv-cache : simplify interface (wip)

6b50ba75

kv-cache : use separate KV cell structs for unified/recurrent

cb02ac80

kv-cache : clean-up

f584750d

model : better llama_model::create_model() signature

458f2a5f

kv-cache : fix recurrent seq_rm()

92e626bd

kv-cache : replace `struct callbacks` with `llama_model &`

43cbf38b

kv-cache : replace `struct graph_params` with `llama_context &`

66198324

kv-cache : fix offload check

95a9f8b5

context : avoid passing unique_ptr

8737e655

kv-cache : avoid using the backends from the llama_context

c9bddfc0

kv-cache : more consistent debug logs [no ci]

09195eb2

kv-cache : do not pass the full llama_context for kv graphs

58e1d40f

kv-cache : remove comment

903e46f1

ggerganov force pushed from 780d6fb1 229 days ago

kv-cache : ggml_rope_ext_inplace -> ggml_rope_ext

00cde5fe

kv-cache : fix recurrent multi-user case

7e79a427

ggerganov force pushed to 7e79a427 229 days ago

memory : remove comments [no ci]

5883c906

ggerganov merged c642bc01 into master 229 days ago

ggerganov deleted the gg/llama-kv-cache-v6 branch 229 days ago

compilade commented on 2025-05-02

Reviewers

slaren

compilade

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

llama.cpp kv-cache : separate recurrent vs non-recurrent impl #12799 Merged

kv-cache : separate recurrent vs non-recurrent impl #12799

llama.cpp
kv-cache : separate recurrent vs non-recurrent impl
#12799

Merged