llama : reuse compute graphs (#14482)

Commit

310 days ago

llama : reuse compute graphs (#14482) * llama : reuse compute graphs ggml-ci * llama-bench : add graph reuse parameter ggml-ci * cont : remove the parameter and the sched resets ggml-ci * graph : rename update() to can_reuse() ggml-ci * params : remove is_same() ggml-ci * graph : set res->params in llm_graph_context constructor ggml-ci * graph : avoid set_max_nodes in llm_graph_result ggml-ci * kv-cache : reuse llama_context's graph result instance ggml-ci * context : reset the previous graph result upon memory updates ggml-ci * batch : llama_ubatch now carries its data instead of pointing to balloc ggml-ci * merge : fix build ggml-ci * graph : fix can_reuse() checks when flash-attention is disabled * graph : move llm_graph_result impl in source file + debug env ggml-ci

References

#14482 - llama : reuse compute graphs

Author

ggerganov

Parents

086cf81e

llama.cpp 01612b74 - llama : reuse compute graphs (#14482)

llama.cpp
01612b74 - llama : reuse compute graphs (#14482)