server: add router device memory margin parameter for dynamic unloading #21231
0cc4m
requested a review
79 days ago
0cc4m
requested a review
79 days ago
ngxson
commented
on 2026-03-31
ngxson
commented
on 2026-03-31
0cc4m
changed the title server: add router max memory parameter for dynamic unloading server: add router device memory margin parameter for dynamic unloading 77 days ago
0cc4m
force pushed
from
4312ed2a
to
1d4a5f93
76 days ago
0cc4m
force pushed
from
0124ec9e
to
3c53be14
66 days ago
0cc4m
force pushed
from
61c25687
to
cf0ebc4e
58 days ago
0cc4m
force pushed
from
cf0ebc4e
to
da1f1688
48 days ago
ngxson
commented
on 2026-05-13
0cc4m
force pushed
from
da1f1688
to
d65d956b
34 days ago
0cc4m
force pushed
from
d65d956b
to
0bb8e548
34 days ago
0cc4m
force pushed
from
5fa97b12
to
6adf9643
28 days ago
danbev
commented
on 2026-05-21
0cc4m
force pushed
from
6adf9643
to
82403fdc
19 days ago
0cc4m
force pushed
from
82403fdc
to
645d17ea
10 days ago
0cc4m
force pushed
from
645d17ea
to
37a767f5
7 days ago
server: add --models-memory-max parameter to allow dynamically unload…
7369ec06
estimate with to-be-loaded model size included
b6fa94e3
use no_alloc to get memory requirements for model load
1019aadc
only set model memory_mb if not previously calculated
4f2efcbb
use memory margin instead of total size limit, apply to each device s…
2c733776
add server memory debug logging
6ecddda7
move llama_context_device_memory function to llama-ext.h
626fd177
fix model count exceeded check
5b2ab4ed
improve memory_per_device map naming
9c6e4218
improve variable naming, fix style
719ab427
also strip models memory margin from child processes
f24b9d53
cont : clean-up
0ade9b8c
replace device memory map with buft memory map. Use llama_get_memory_…
c5847772
extract duplicated check into helper function
bd39f024
move model memory estimation to subprocess
c03917ad
precompute name->buft map, map GPU host types to CPU buft
1201397c
cleanup unused variable
db5d5b3b
remove duplicated init calls
d1b5a682
0cc4m
force pushed
from
37a767f5
to
d1b5a682
15 hours ago
Login to write a write a comment.
Login via GitHub