PR #11213 llama : refactor llama_kv_cache, llama_context and llm_build_context

llama : add struct llama_kv_cache (wip) [no ci]

ggerganov committed 1 year ago

llama : cont

ggerganov committed 1 year ago

kv_cache : functions -> members

ggerganov committed 1 year ago

kv_cache : fix

ggerganov committed 1 year ago

kv_cache : minor

ggerganov committed 1 year ago

context : prepare kv_cache_read/write to be moved to kv_cache

ggerganov committed 1 year ago

kv_cache : move state read/write to llama_kv_cache

ggerganov committed 1 year ago

llama : update llama_kv_self API

ggerganov committed 1 year ago

context : minor

ggerganov committed 1 year ago

llama : fix names [no ci]

ggerganov committed 1 year ago

llama : remove references to llama_kv_cache (wip)

ggerganov committed 1 year ago

cont : move kv_self update to llama_context

ggerganov committed 1 year ago

context : add get_ctx_padding()

ggerganov committed 1 year ago

context : move adapter code in the implementation [no ci]

ggerganov committed 1 year ago

context : initial need_reserve logic

ggerganov committed 1 year ago

wip

ggerganov committed 1 year ago

context : introduce llama_batch_manager

ggerganov committed 1 year ago

context : prepare for abstraction

ggerganov committed 1 year ago

Merge branch 'master' into gg/llama-kv-cache

ggerganov committed 1 year ago

llama : resolve rwkv conflict

ggerganov committed 1 year ago

Merge branch 'master' into gg/llama-kv-cache

ggerganov committed 1 year ago

Merge branch 'master' into gg/llama-kv-cache

ggerganov committed 1 year ago

Merge branch 'master' into gg/llama-kv-cache

ggerganov committed 1 year ago

context : store graph build function callback

ggerganov committed 1 year ago

Merge branch 'master' into gg/llama-kv-cache

ggerganov committed 1 year ago

llama : fix rwkv inference (#11618)

MollySophia committed 1 year ago

llama : clear whitespaces

ggerganov committed 1 year ago

Merge branch 'master' into gg/llama-kv-cache

ggerganov committed 1 year ago

kv-cache : fix defrag condition

ggerganov committed 1 year ago

Merge branch 'master' into gg/llama-kv-cache

ggerganov committed 1 year ago

llama : dedup reserve code

ggerganov committed 1 year ago

server : increase context size for the tests

ggerganov committed 1 year ago

context : add decode/encode

ggerganov committed 1 year ago

bman : remove ubatch member

ggerganov committed 1 year ago

context : make output functions members

ggerganov committed 1 year ago

context : initial abstraction

ggerganov committed 1 year ago

context : move encode/decode to llama-context.cpp

ggerganov committed 1 year ago

context : improve llama_context encapsulation

ggerganov committed 1 year ago

context : minor naming fix

ggerganov committed 1 year ago

context : move build_rope_factors to base class

ggerganov committed 1 year ago

context : introduce llama_graph_i

ggerganov committed 1 year ago

context : prepare llama_model graph build

ggerganov committed 1 year ago

llama : models now build their graphs using llama_graph_i

ggerganov committed 1 year ago

graph : restore ubatch in build_cb

ggerganov committed 1 year ago

context : rename to llama_context_kv_self

ggerganov committed 1 year ago

llama : introduce llama_io interfaces

ggerganov committed 1 year ago

context : abstract state read/write

ggerganov committed 1 year ago

context : minor cleanup

ggerganov committed 1 year ago

context : move output functionality to base class

ggerganov committed 1 year ago

context : abstract input

ggerganov committed 1 year ago

context : abstract constructor and init

ggerganov committed 1 year ago

context : remove batch_manager

ggerganov committed 1 year ago

context : move common inputs to base class

ggerganov committed 1 year ago

graph : update attn/kv_self names

ggerganov committed 1 year ago

Merge branch 'master' into gg/llama-kv-cache

ggerganov committed 1 year ago

graph : add llama_graph_result

ggerganov committed 1 year ago

cont : return important tensors

ggerganov committed 1 year ago

cont : use returend tensors from the graph build

ggerganov committed 1 year ago

llama : reorder encode/decode in sources

ggerganov committed 1 year ago

context : minor simplify

ggerganov committed 1 year ago

model : pass llama_graph_i as ptr

ggerganov committed 1 year ago

kv-cache : prepare for abstraction

ggerganov committed 1 year ago

kv-cache : remove llama_kv_cache_i

ggerganov committed 1 year ago

context : add llama_context_recurrent

ggerganov committed 1 year ago

graph : simplify attention api

ggerganov committed 1 year ago

model : fix order kvq -> qkv

ggerganov committed 1 year ago

Merge branch 'master' into gg/llama-kv-cache

ggerganov committed 1 year ago

context : add cache-less llama_context

ggerganov committed 1 year ago

context : fix causal input for cache-less case

ggerganov committed 1 year ago

context : add llama_kv_cache_recurrent prototype

ggerganov committed 1 year ago

context : add save/load for recurrent context

ggerganov committed 1 year ago

graph : remove worst_case from the API

ggerganov committed 1 year ago

context : add logs

ggerganov committed 1 year ago

context : wrap input tensors in struct

ggerganov committed 1 year ago

context : fix n_outputs init

ggerganov committed 1 year ago

Merge branch 'master' into gg/llama-kv-cache

ggerganov committed 1 year ago

wip enc-dec

ggerganov committed 1 year ago

cont : enc should work now, next is dec

ggerganov committed 1 year ago

graph : remove the build_kv_... API from llama_graph_i

ggerganov committed 1 year ago

context : remove redundant virtual, protected -> private

ggerganov committed 1 year ago

context : fix recurrent reserve

ggerganov committed 1 year ago

context : reuse built_attn_mha

ggerganov committed 1 year ago

context : explicit llama_context_i abstract interface

ggerganov committed 1 year ago

enc-dec : compose wip

ggerganov committed 1 year ago

context : enc-dec is now working

ggerganov committed 1 year ago

context : fix enc-dec state save/load

ggerganov committed 1 year ago

context : pass embeddings tensor from encoder to decoder

ggerganov committed 1 year ago

context : disable encoder embd tensor for now

ggerganov committed 1 year ago

Merge branch 'master' into gg/llama-kv-cache

ggerganov committed 1 year ago

kv-cache : basic abstraction

ggerganov committed 1 year ago

llama : introduce concept of llama_memory

ggerganov committed 1 year ago

context : decouple inputs, llama_graph_i become const (WIP)

ggerganov committed 1 year ago

cont : migrate the rest of the inputs out of llama_context

ggerganov committed 1 year ago

graph : move non-context related logic to llm_build_context

ggerganov committed 1 year ago

graph : add comments

ggerganov committed 1 year ago

llama.cpp llama : refactor llama_kv_cache, llama_context and llm_build_context #11213 Closed

llama.cpp
llama : refactor llama_kv_cache, llama_context and llm_build_context
#11213

Closed