PR #11213 llama : refactor llama_kv_cache, llama_context and llm_build_context

github-actions added examples

github-actions added server

ggerganov force pushed 1 year ago

ggerganov force pushed to fb740247 1 year ago

github-actions added android

ggerganov force pushed to 9027f329 1 year ago

ggerganov commented on 2025-01-16

slaren commented on 2025-01-16

ggerganov commented on 2025-01-16

ggerganov marked this pull request as ready for review 1 year ago

ggerganov requested a review from

ngxson 1 year ago

ggerganov changed the title ~~llama : add struct llama_kv_cache~~ llama : refactor llama_kv_cache, llama_context and llm_build_context 1 year ago

ggerganov force pushed from 60106c62 1 year ago

ggerganov marked this pull request as draft 1 year ago

ggerganov force pushed to a47d389c 1 year ago

llama : add struct llama_kv_cache (wip) [no ci]

f78b396e

llama : cont

e4550fba

kv_cache : functions -> members

4d7bd03e

kv_cache : fix

fef90cb3

kv_cache : minor

73a14ecc

context : prepare kv_cache_read/write to be moved to kv_cache

4cd1b6fa

kv_cache : move state read/write to llama_kv_cache

fd05ab87

llama : update llama_kv_self API

17b363af

context : minor

a19f671f

llama : fix names [no ci]

ae274f97

llama : remove references to llama_kv_cache (wip)

f2524c0e

cont : move kv_self update to llama_context

b4ec1d44

context : add get_ctx_padding()

f0713498

context : move adapter code in the implementation [no ci]

c75ba685

context : initial need_reserve logic

133ad6a7

wip

cb8f2095

context : introduce llama_batch_manager

99422dfa

context : prepare for abstraction

a0c500b4

ggerganov force pushed from a47d389c to a0c500b4 1 year ago

Merge branch 'master' into gg/llama-kv-cache

e665b57f

llama : resolve rwkv conflict

91888569

Merge branch 'master' into gg/llama-kv-cache

c30e34cd

Merge branch 'master' into gg/llama-kv-cache

a40ba49f

MollySophia requested a review from

MollySophia 1 year ago

Merge branch 'master' into gg/llama-kv-cache

5d3491e7

context : store graph build function callback

3e23be79

ggerganov force pushed to 3e23be79 1 year ago

Merge branch 'master' into gg/llama-kv-cache

74b08072

llama : fix rwkv inference (#11618)

1eca8916

llama : clear whitespaces

e0d913fc

Merge branch 'master' into gg/llama-kv-cache

0f1c1cab

kv-cache : fix defrag condition

b15fede7

ggerganov force pushed to b15fede7 1 year ago

Merge branch 'master' into gg/llama-kv-cache

972f91c7

llama : dedup reserve code

f9971ef2

server : increase context size for the tests

879ba827

github-actions added python

context : add decode/encode

ef358ee7

ggerganov force pushed to ef358ee7 1 year ago

bman : remove ubatch member

d1d8d530

context : make output functions members

2cd8a903

context : initial abstraction

02ef4be9

ggerganov force pushed to 02ef4be9 1 year ago

context : move encode/decode to llama-context.cpp

b52b79b0

context : improve llama_context encapsulation

8da7f612

ggerganov force pushed to 8da7f612 1 year ago

context : minor naming fix

d146a14f

context : move build_rope_factors to base class

5eae8e51

context : introduce llama_graph_i

e633dc17

context : prepare llama_model graph build

0ab50f1b

ggerganov force pushed to 0ab50f1b 1 year ago

llama : models now build their graphs using llama_graph_i

f63aeecc

graph : restore ubatch in build_cb

6ee86e5e

context : rename to llama_context_kv_self

fbe6a072

llama : introduce llama_io interfaces

3a504d9a

ggerganov force pushed to 3a504d9a 1 year ago

context : abstract state read/write

f7c7757b

context : minor cleanup

e08f38df

context : move output functionality to base class

107d1e2c

context : abstract input

ed3cb55a

context : abstract constructor and init

131743ff

ggerganov force pushed to 131743ff 1 year ago

context : remove batch_manager

d5e8e1a2

context : move common inputs to base class

82806456

graph : update attn/kv_self names

1d801d27

Merge branch 'master' into gg/llama-kv-cache

f0d3ff23

graph : add llama_graph_result

c2359031

cont : return important tensors

172f6169

ggerganov force pushed to 172f6169 1 year ago

cont : use returend tensors from the graph build

bc6f187e

llama : reorder encode/decode in sources

befe14f0

context : minor simplify

9e50456e

model : pass llama_graph_i as ptr

2bffc2d5

kv-cache : prepare for abstraction

f5cedbca

ggerganov force pushed to f5cedbca 1 year ago

kv-cache : remove llama_kv_cache_i

5f11a550

ggerganov force pushed 1 year ago

context : add llama_context_recurrent

e17e4b72

ggerganov force pushed to e17e4b72 1 year ago

graph : simplify attention api

2eacb4c1

model : fix order kvq -> qkv

f95b04a2

Merge branch 'master' into gg/llama-kv-cache

072280ea

ggerganov force pushed 1 year ago

context : add cache-less llama_context

b1554be1

ggerganov force pushed to b1554be1 1 year ago

context : fix causal input for cache-less case

ad870c49

ggerganov force pushed to ad870c49 1 year ago

context : add llama_kv_cache_recurrent prototype

08011c2c

context : add save/load for recurrent context

2645a7d9

graph : remove worst_case from the API

548c230d

context : add logs

ebf1bdf9

context : wrap input tensors in struct

f588a70d

context : fix n_outputs init

3753b30d

Merge branch 'master' into gg/llama-kv-cache

c4c0a4d1

wip enc-dec

f5e80208

ngxson commented on 2025-02-22

cont : enc should work now, next is dec

372fa3a8

graph : remove the build_kv_... API from llama_graph_i

6378112c

context : remove redundant virtual, protected -> private

0699a44c

context : fix recurrent reserve

a5a85a3b

context : reuse built_attn_mha

4a1054b5

context : explicit llama_context_i abstract interface

9cd78f11

enc-dec : compose wip

be58e300

context : enc-dec is now working

e5bc5f8e

context : fix enc-dec state save/load

e2b3294f

context : pass embeddings tensor from encoder to decoder

4efe9898

ggerganov commented on 2025-02-25

context : disable encoder embd tensor for now

952feedf

Merge branch 'master' into gg/llama-kv-cache

82675a01

kv-cache : basic abstraction

828effd9

ggerganov force pushed to 828effd9 1 year ago

llama : introduce concept of llama_memory

38db8a58

context : decouple inputs, llama_graph_i become const (WIP)

7f02ee56

ggerganov force pushed to 7f02ee56 1 year ago

cont : migrate the rest of the inputs out of llama_context

9cab53c7

graph : move non-context related logic to llm_build_context

0f7daa9d

graph : add comments

624f7bd0

ggerganov closed this 1 year ago

llama.cpp
llama : refactor llama_kv_cache, llama_context and llm_build_context
#11213

Closed

llama : refactor llama_kv_cache, llama_context and llm_build_context #11213

llama.cpp llama : refactor llama_kv_cache, llama_context and llm_build_context #11213 Closed

llama : refactor llama_kv_cache, llama_context and llm_build_context #11213

llama.cpp
llama : refactor llama_kv_cache, llama_context and llm_build_context
#11213

Closed