PR #13746 kv-cache : refactor + add llama_memory_state_i

kv-cache : refactor + add llama_memory_state_i #13746

ggerganov merged 14 commits into master from gg/kv-cache-simplify-part3

github-actions added examples

github-actions added server

ggerganov force pushed to 8323e238 1 year ago

Base automatically changed from gg/kv-cache-simplify-part2 to master 1 year ago

ggerganov force pushed to 1eec34ad 1 year ago

ggerganov marked this pull request as ready for review 1 year ago

ggerganov requested a review from

ngxson 1 year ago

ggerganov requested a review from

slaren 1 year ago

ggerganov commented on 2025-05-25

ggerganov force pushed to 0b73da5a 1 year ago

ggerganov force pushed from 0b73da5a to 2252eefd 1 year ago

ggerganov marked this pull request as draft 1 year ago

gabe-l-hart commented on 2025-05-27

ggerganov force pushed 1 year ago

ggerganov force pushed to a3ebf0aa 1 year ago

slaren commented on 2025-05-28

ggerganov force pushed to a592c137 1 year ago

gabe-l-hart commented on 2025-05-28

ggerganov force pushed 1 year ago

ggerganov force pushed to eed741e9 1 year ago

slaren approved these changes on 2025-05-29

ggerganov force pushed from 9548d2a1 1 year ago

ggerganov force pushed 1 year ago

ggerganov force pushed to 2b984f41 1 year ago

ggerganov marked this pull request as ready for review 1 year ago

kv-cache : simplify the "struct llama_kv_cache" interface

773b6e39

kv-cache : revert the (n_swa + n_ubatch) change (for next PR)

9fc50dcd

kv-cache : some comments

c2c35917

context : fix graph reserve for multiple sequences

88567820

kv-cache : fix typo [no ci]

bffb9d4a

kv-cache : fix find_slot() logic for free slots

32cc9eab

llama : add TODO for deprecating the defrag API in the future

f97de9b7

kv-cache : improve find_slot() using min/max seq pos info

7764d914

llama : handle aborts and compute errors

780bba94

memory : extract state into llama_memory_state

dbcfa5f1

kv-cache : add comments

f2ded9d4

server : update batching logic to reset n_batch on successful decode

e230e514

server : upon full re-processing, remove the sequence from the cache

3cf51863

kv-cache : add TODO for doing split_equal when split_simple fails

71619f2d

ggerganov force pushed from f23e4cca to 71619f2d 1 year ago

ggerganov changed the title ~~kv-cache : simplify~~ kv-cache : refactor + add llama_memory_state_i 1 year ago

ggerganov merged 12d0188c into master 1 year ago

ggerganov deleted the gg/kv-cache-simplify-part3 branch 1 year ago

Reviewers

slaren

gabe-l-hart

compilade

ngxson

Assignees

No one assigned

Labels

examples server

Milestone

No milestone

llama.cpp kv-cache : refactor + add llama_memory_state_i #13746 Merged

kv-cache : refactor + add llama_memory_state_i #13746

llama.cpp
kv-cache : refactor + add llama_memory_state_i
#13746

Merged