llama.cpp
0c58ba33 - rpc : reuse compute graph buffers (#21299)

Commit

41 days ago

rpc : reuse compute graph buffers (#21299) Reuse the buffer for the ggml context which is used for creating the compute graph on the server side. This partially addresses a memory leak created by the CUDA backend due to using buffer addresses as cache keys. ref: #21265 ref: #20315

References

#21299 - rpc : reuse compute graph buffers

Author

rgerganov

Parents

57ace0d6

llama.cpp 0c58ba33 - rpc : reuse compute graph buffers (#21299)

llama.cpp
0c58ba33 - rpc : reuse compute graph buffers (#21299)