llama.cpp
llama : refactor llama_kv_cache, llama_context and llm_build_context
#11213
Closed

llama : refactor llama_kv_cache, llama_context and llm_build_context #11213

ggerganov wants to merge 95 commits into master from gg/llama-kv-cache
ggerganov
github-actions github-actions added examples
github-actions github-actions added server
ggerganov
ggerganov ggerganov force pushed 1 year ago
ggerganov ggerganov force pushed 1 year ago
ggerganov ggerganov force pushed to fb740247 1 year ago
github-actions github-actions added android
ggerganov ggerganov force pushed to 9027f329 1 year ago
ggerganov
ggerganov commented on 2025-01-16
slaren
slaren commented on 2025-01-16
ggerganov
ggerganov commented on 2025-01-16
ggerganov ggerganov marked this pull request as ready for review 1 year ago
ggerganov ggerganov requested a review from ngxson ngxson 1 year ago
ggerganov
ggerganov ggerganov changed the title llama : add struct llama_kv_cache llama : refactor llama_kv_cache, llama_context and llm_build_context 1 year ago
slaren
ggerganov ggerganov force pushed from 60106c62 1 year ago
ggerganov ggerganov marked this pull request as draft 1 year ago
ggerganov ggerganov force pushed to a47d389c 1 year ago
ggerganov
ggerganov llama : add struct llama_kv_cache (wip) [no ci]
f78b396e
ggerganov llama : cont
e4550fba
ggerganov kv_cache : functions -> members
4d7bd03e
ggerganov kv_cache : fix
fef90cb3
ggerganov kv_cache : minor
73a14ecc
ggerganov context : prepare kv_cache_read/write to be moved to kv_cache
4cd1b6fa
ggerganov kv_cache : move state read/write to llama_kv_cache
fd05ab87
ggerganov llama : update llama_kv_self API
17b363af
ggerganov context : minor
a19f671f
ggerganov llama : fix names [no ci]
ae274f97
ggerganov llama : remove references to llama_kv_cache (wip)
f2524c0e
ggerganov cont : move kv_self update to llama_context
b4ec1d44
ggerganov context : add get_ctx_padding()
f0713498
ggerganov context : move adapter code in the implementation [no ci]
c75ba685
ggerganov context : initial need_reserve logic
133ad6a7
ggerganov wip
cb8f2095
ggerganov context : introduce llama_batch_manager
99422dfa
ggerganov context : prepare for abstraction
a0c500b4
ggerganov ggerganov force pushed from a47d389c to a0c500b4 1 year ago
ggerganov Merge branch 'master' into gg/llama-kv-cache
e665b57f
fairydreaming
ggerganov
ggerganov llama : resolve rwkv conflict
91888569
ggerganov Merge branch 'master' into gg/llama-kv-cache
c30e34cd
ggerganov Merge branch 'master' into gg/llama-kv-cache
a40ba49f
MollySophia MollySophia requested a review from MollySophia MollySophia 1 year ago
ggerganov Merge branch 'master' into gg/llama-kv-cache
5d3491e7
ggerganov
ggerganov context : store graph build function callback
3e23be79
ggerganov ggerganov force pushed to 3e23be79 1 year ago
ggerganov Merge branch 'master' into gg/llama-kv-cache
74b08072
MollySophia
ggerganov
MollySophia llama : fix rwkv inference (#11618)
1eca8916
ggerganov llama : clear whitespaces
e0d913fc
ggerganov Merge branch 'master' into gg/llama-kv-cache
0f1c1cab
ggerganov kv-cache : fix defrag condition
b15fede7
ggerganov ggerganov force pushed to b15fede7 1 year ago
ggerganov Merge branch 'master' into gg/llama-kv-cache
972f91c7
ggerganov llama : dedup reserve code
f9971ef2
ggerganov server : increase context size for the tests
879ba827
github-actions github-actions added python
ggerganov context : add decode/encode
ef358ee7
ggerganov ggerganov force pushed to ef358ee7 1 year ago
ggerganov bman : remove ubatch member
d1d8d530
ggerganov context : make output functions members
2cd8a903
ggerganov context : initial abstraction
02ef4be9
ggerganov ggerganov force pushed to 02ef4be9 1 year ago
ggerganov context : move encode/decode to llama-context.cpp
b52b79b0
ggerganov context : improve llama_context encapsulation
8da7f612
ggerganov ggerganov force pushed to 8da7f612 1 year ago
ggerganov context : minor naming fix
d146a14f
ggerganov context : move build_rope_factors to base class
5eae8e51
ggerganov context : introduce llama_graph_i
e633dc17
ggerganov context : prepare llama_model graph build
0ab50f1b
ggerganov ggerganov force pushed to 0ab50f1b 1 year ago
ggerganov llama : models now build their graphs using llama_graph_i
f63aeecc
ggerganov graph : restore ubatch in build_cb
6ee86e5e
ggerganov context : rename to llama_context_kv_self
fbe6a072
ggerganov llama : introduce llama_io interfaces
3a504d9a
ggerganov ggerganov force pushed to 3a504d9a 1 year ago
ggerganov context : abstract state read/write
f7c7757b
ggerganov context : minor cleanup
e08f38df
ggerganov context : move output functionality to base class
107d1e2c
ggerganov context : abstract input
ed3cb55a
ggerganov context : abstract constructor and init
131743ff
ggerganov ggerganov force pushed to 131743ff 1 year ago
ggerganov context : remove batch_manager
d5e8e1a2
ggerganov context : move common inputs to base class
82806456
ggerganov graph : update attn/kv_self names
1d801d27
ggerganov Merge branch 'master' into gg/llama-kv-cache
f0d3ff23
ggerganov graph : add llama_graph_result
c2359031
ggerganov cont : return important tensors
172f6169
ggerganov ggerganov force pushed to 172f6169 1 year ago
ggerganov cont : use returend tensors from the graph build
bc6f187e
ggerganov llama : reorder encode/decode in sources
befe14f0
ggerganov context : minor simplify
9e50456e
ggerganov model : pass llama_graph_i as ptr
2bffc2d5
ggerganov kv-cache : prepare for abstraction
f5cedbca
ggerganov ggerganov force pushed to f5cedbca 1 year ago
ggerganov kv-cache : remove llama_kv_cache_i
5f11a550
ggerganov ggerganov force pushed 1 year ago
ggerganov ggerganov force pushed 1 year ago
ggerganov ggerganov force pushed 1 year ago
ggerganov context : add llama_context_recurrent
e17e4b72
ggerganov ggerganov force pushed to e17e4b72 1 year ago
ggerganov graph : simplify attention api
2eacb4c1
ggerganov model : fix order kvq -> qkv
f95b04a2
ggerganov Merge branch 'master' into gg/llama-kv-cache
072280ea
ggerganov ggerganov force pushed 1 year ago
ggerganov ggerganov force pushed 1 year ago
ggerganov context : add cache-less llama_context
b1554be1
ggerganov ggerganov force pushed to b1554be1 1 year ago
ggerganov context : fix causal input for cache-less case
ad870c49
ggerganov ggerganov force pushed to ad870c49 1 year ago
ggerganov context : add llama_kv_cache_recurrent prototype
08011c2c
ggerganov
ngxson
ggerganov context : add save/load for recurrent context
2645a7d9
ggerganov graph : remove worst_case from the API
548c230d
ggerganov context : add logs
ebf1bdf9
ggerganov context : wrap input tensors in struct
f588a70d
ggerganov context : fix n_outputs init
3753b30d
ggerganov Merge branch 'master' into gg/llama-kv-cache
c4c0a4d1
ggerganov wip enc-dec
f5e80208
fairydreaming
ngxson
ngxson commented on 2025-02-22
ngxson
ggerganov
ggerganov
ggerganov cont : enc should work now, next is dec
372fa3a8
fairydreaming
ggerganov graph : remove the build_kv_... API from llama_graph_i
6378112c
ggerganov context : remove redundant virtual, protected -> private
0699a44c
ggerganov context : fix recurrent reserve
a5a85a3b
ggerganov context : reuse built_attn_mha
4a1054b5
ggerganov context : explicit llama_context_i abstract interface
9cd78f11
ggerganov enc-dec : compose wip
be58e300
ggerganov context : enc-dec is now working
e5bc5f8e
ggerganov context : fix enc-dec state save/load
e2b3294f
ggerganov context : pass embeddings tensor from encoder to decoder
4efe9898
ggerganov
ggerganov commented on 2025-02-25
MollySophia
fairydreaming
ggerganov
ggerganov context : disable encoder embd tensor for now
952feedf
ggerganov Merge branch 'master' into gg/llama-kv-cache
82675a01
ggerganov kv-cache : basic abstraction
828effd9
ggerganov ggerganov force pushed to 828effd9 1 year ago
ggerganov llama : introduce concept of llama_memory
38db8a58
fairydreaming
ggerganov
ggerganov context : decouple inputs, llama_graph_i become const (WIP)
7f02ee56
ggerganov ggerganov force pushed to 7f02ee56 1 year ago
ggerganov cont : migrate the rest of the inputs out of llama_context
9cab53c7
ggerganov graph : move non-context related logic to llm_build_context
0f7daa9d
ggerganov graph : add comments
624f7bd0
ggerganov
fairydreaming
ggerganov
ggerganov ggerganov closed this 1 year ago
artiomborovinskii

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone