llama.cpp
0c58ba33 - rpc : reuse compute graph buffers (#21299)

Commit
41 days ago
rpc : reuse compute graph buffers (#21299) Reuse the buffer for the ggml context which is used for creating the compute graph on the server side. This partially addresses a memory leak created by the CUDA backend due to using buffer addresses as cache keys. ref: #21265 ref: #20315
Author
Parents
Loading