kv-cache : separate recurrent vs non-recurrent impl #12799
ggerganov
force pushed
from
19eb81e0
246 days ago
ggerganov
force pushed
to
d953616e
244 days ago
ggerganov
force pushed
from
ed8942a3
to
2c3547e5
239 days ago
ggerganov
marked this pull request as ready for review 237 days ago
ggerganov
force pushed
to
d31e31da
237 days ago
ggerganov
force pushed
to
dec80ace
236 days ago
ggerganov
force pushed
from
dec80ace
233 days ago
ggerganov
force pushed
to
65cde6d4
233 days ago
ggerganov
force pushed
to
7e4b5459
231 days ago
ggerganov
force pushed
to
eb623f2f
231 days ago
slaren
commented
on 2025-04-30
slaren
commented
on 2025-04-30
slaren
approved these changes
on 2025-04-30
kv-cache : serparate recurrent vs non-recurrent impl (wip)
22bda486
kv-cache : init -> contructor + add llama_memory_params
81457990
kv-cache : fix callback reference
49aa8b83
context : llama_kv_cache -> llama_memory_i
838b3cca
context : move memory creation logic to model
8e4d3baa
llama : remove reference of memory during encode
7fec0814
kv-cache : hide padding details in the implementation
59af92bb
kv-cache : add ubatch_next()
6413b937
context : simplify sbatch logic
e869515b
kv-cache : hide defrag logic in the implementation
ae2cd005
context : hide kv cache details in implementation
fdb7206d
build : fix
13d69a52
cont : another fix
5ef7559a
kv-cache : simplify interface (wip)
6b50ba75
kv-cache : use separate KV cell structs for unified/recurrent
cb02ac80
kv-cache : clean-up
f584750d
model : better llama_model::create_model() signature
458f2a5f
kv-cache : fix recurrent seq_rm()
92e626bd
kv-cache : replace `struct callbacks` with `llama_model &`
43cbf38b
kv-cache : replace `struct graph_params` with `llama_context &`
66198324
kv-cache : fix offload check
95a9f8b5
context : avoid passing unique_ptr
8737e655
kv-cache : avoid using the backends from the llama_context
c9bddfc0
kv-cache : more consistent debug logs [no ci]
09195eb2
kv-cache : do not pass the full llama_context for kv graphs
58e1d40f
kv-cache : remove comment
903e46f1
ggerganov
force pushed
from
780d6fb1
229 days ago
kv-cache : ggml_rope_ext_inplace -> ggml_rope_ext
00cde5fe
kv-cache : fix recurrent multi-user case
7e79a427
ggerganov
force pushed
to
7e79a427
229 days ago
memory : remove comments [no ci]
5883c906
ggerganov
merged
c642bc01
into master 229 days ago
ggerganov
deleted the gg/llama-kv-cache-v6 branch 229 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub