llama.cpp
e5155e69 - server : export max observed n_past value (#15361)

Commit
135 days ago
server : export max observed n_past value (#15361) Add tracking for high watermark cache usage and make it available in /metrics endpoint. Use-case: Tracking largest needed cache usage under realistic workload to better understand memory requirements and be able to adjust cache size/quantization for model/cache accordingly.
Author
Parents
Loading