llama.cpp
server: router fix model unload reload deadlock
#22284
Open

server: router fix model unload reload deadlock #22284

0cc4m wants to merge 15 commits into master from 0cc4m/server-router-fix-reload-deadlock
0cc4m
0cc4m server: add --models-memory-max parameter to allow dynamically unload…
8e8e2007
0cc4m estimate with to-be-loaded model size included
777395f6
0cc4m use no_alloc to get memory requirements for model load
2603b4c5
0cc4m only set model memory_mb if not previously calculated
9b5af58a
0cc4m use memory margin instead of total size limit, apply to each device s…
56122b35
0cc4m add server memory debug logging
51538c1f
0cc4m move llama_context_device_memory function to llama-ext.h
ba2521c6
0cc4m fix model count exceeded check
75000630
0cc4m improve memory_per_device map naming
173da43c
0cc4m improve variable naming, fix style
69e30861
0cc4m also strip models memory margin from child processes
eb2cf73f
ggerganov cont : clean-up
1a8aec0a
0cc4m handle models that need to be downloaded before estimation
b1623a61
0cc4m load directly from downloaded state
cf0ebc4e
0cc4m server: keep router model refcount to avoid unloading models that hav…
a5355a02
github-actions github-actions added examples
github-actions github-actions added python
github-actions github-actions added server

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone