server: add auto-sleep after N seconds of idle #18228
implement sleeping at queue level
e1d7b434
implement server-context suspend
197e5785
add test
db3b78d2
add docs
aea8f8c1
ngxson
marked this pull request as ready for review 24 days ago
optimization: add fast path
44a5a26c
make sure to free llama_init
e6ab62c4
nits
937b0641
fix use-after-free
105e2f3c
allow /models to be accessed during sleeping, fix use-after-free
fd09f880
don't allow accessing /models during sleep, it is not thread-safe
0bb9bc48
fix data race on accessing props and model_meta
d8500827
small clean up
1663d2f8
trailing whitespace
b51da9a1
rm outdated comments
06a5ebe1
Assignees
No one assigned
Labels
examples
python
server
Login to write a write a comment.
Login via GitHub