PR #22284 server: router fix model unload reload deadlock

server: router fix model unload reload deadlock #22284

0cc4m wants to merge 15 commits into master from 0cc4m/server-router-fix-reload-deadlock

server: add --models-memory-max parameter to allow dynamically unload…

8e8e2007

estimate with to-be-loaded model size included

777395f6

use no_alloc to get memory requirements for model load

2603b4c5

only set model memory_mb if not previously calculated

9b5af58a

use memory margin instead of total size limit, apply to each device s…

56122b35

add server memory debug logging

51538c1f

move llama_context_device_memory function to llama-ext.h

ba2521c6

fix model count exceeded check

75000630

improve memory_per_device map naming

173da43c

improve variable naming, fix style

69e30861

also strip models memory margin from child processes

eb2cf73f

cont : clean-up

1a8aec0a

handle models that need to be downloaded before estimation

b1623a61

load directly from downloaded state

cf0ebc4e

server: keep router model refcount to avoid unloading models that hav…

a5355a02

github-actions added examples

github-actions added python

github-actions added server

Reviewers

No reviews

Assignees

No one assigned

Labels

examples python server

Milestone

No milestone