server: add router device memory margin parameter for dynamic unloading #21231
0cc4m
requested a review
72 days ago
0cc4m
requested a review
72 days ago
ngxson
commented
on 2026-03-31
ngxson
commented
on 2026-03-31
0cc4m
changed the title server: add router max memory parameter for dynamic unloading server: add router device memory margin parameter for dynamic unloading 71 days ago
0cc4m
force pushed
from
4312ed2a
to
1d4a5f93
70 days ago
0cc4m
force pushed
from
0124ec9e
to
3c53be14
60 days ago
0cc4m
force pushed
from
61c25687
to
cf0ebc4e
51 days ago
0cc4m
force pushed
from
cf0ebc4e
to
da1f1688
41 days ago
ngxson
commented
on 2026-05-13
0cc4m
force pushed
from
da1f1688
to
d65d956b
28 days ago
0cc4m
force pushed
from
d65d956b
to
0bb8e548
28 days ago
0cc4m
force pushed
from
5fa97b12
to
6adf9643
22 days ago
danbev
commented
on 2026-05-21
0cc4m
force pushed
from
6adf9643
to
82403fdc
13 days ago
0cc4m
force pushed
from
82403fdc
to
645d17ea
4 days ago
server: add --models-memory-max parameter to allow dynamically unload…
34a9a7e5
estimate with to-be-loaded model size included
716cd77e
use no_alloc to get memory requirements for model load
d6dac7e9
only set model memory_mb if not previously calculated
40f8b387
use memory margin instead of total size limit, apply to each device s…
fdca28e9
add server memory debug logging
3fe090f2
move llama_context_device_memory function to llama-ext.h
7a266473
fix model count exceeded check
91b0d08c
improve memory_per_device map naming
8973faab
improve variable naming, fix style
fdfda6b5
also strip models memory margin from child processes
bdd79f03
cont : clean-up
a45085e9
handle models that need to be downloaded before estimation
b4c56304
load directly from downloaded state
ffd27c69
replace device memory map with buft memory map. Use llama_get_memory_…
efb55e71
extract duplicated check into helper function
7d58d31c
move model memory estimation to subprocess
44139bc7
precompute name->buft map, map GPU host types to CPU buft
409120f5
cleanup unused variable
ca54eda6
remove duplicated init calls
37a767f5
0cc4m
force pushed
from
645d17ea
to
37a767f5
23 hours ago
Login to write a write a comment.
Login via GitHub