llama.cpp
server: add router device memory margin parameter for dynamic unloading
#21231
Open

server: add router device memory margin parameter for dynamic unloading #21231

0cc4m wants to merge 18 commits into master from 0cc4m/server-memory-limit
0cc4m
0cc4m 0cc4m requested a review 79 days ago
0cc4m 0cc4m requested a review 79 days ago
ngxson
ngxson commented on 2026-03-31
ngxson
ngxson commented on 2026-03-31
github-actions github-actions added examples
github-actions github-actions added server
0cc4m 0cc4m requested a review from ggerganov ggerganov 77 days ago
ggerganov
0cc4m
0cc4m 0cc4m changed the title server: add router max memory parameter for dynamic unloading server: add router device memory margin parameter for dynamic unloading 77 days ago
0cc4m
ggerganov
ggerganov commented on 2026-04-02
0cc4m 0cc4m force pushed from 4312ed2a to 1d4a5f93 76 days ago
ggerganov
ggerganov commented on 2026-04-03
0cc4m
0cc4m 0cc4m requested a review from ggerganov ggerganov 69 days ago
0cc4m 0cc4m requested a review from ngxson ngxson 69 days ago
ggerganov
ggerganov ggerganov assigned ggerganov ggerganov 68 days ago
0cc4m 0cc4m force pushed from 0124ec9e to 3c53be14 66 days ago
ggerganov
ggerganov commented on 2026-04-16
0cc4m 0cc4m force pushed from 61c25687 to cf0ebc4e 58 days ago
0cc4m 0cc4m force pushed from cf0ebc4e to da1f1688 48 days ago
MGAndreasen
0cc4m
ggerganov
ggerganov commented on 2026-05-04
ggerganov
ggerganov commented on 2026-05-04
ggerganov
ggerganov commented on 2026-05-04
ngxson
ngxson commented on 2026-05-13
0cc4m 0cc4m force pushed from da1f1688 to d65d956b 34 days ago
0cc4m 0cc4m force pushed from d65d956b to 0bb8e548 34 days ago
0cc4m
0cc4m
ggerganov
0cc4m 0cc4m force pushed from 5fa97b12 to 6adf9643 28 days ago
0cc4m
danbev
danbev commented on 2026-05-21
ServeurpersoCom
ServeurpersoCom
0cc4m
ServeurpersoCom
ServeurpersoCom
0cc4m
ServeurpersoCom
ServeurpersoCom
ORippler
0cc4m 0cc4m force pushed from 6adf9643 to 82403fdc 19 days ago
0cc4m 0cc4m force pushed from 82403fdc to 645d17ea 10 days ago
0cc4m
0cc4m 0cc4m force pushed from 645d17ea to 37a767f5 7 days ago
ggerganov
ggerganov commented on 2026-06-16
0cc4m
0cc4m server: add --models-memory-max parameter to allow dynamically unload…
7369ec06
0cc4m estimate with to-be-loaded model size included
b6fa94e3
0cc4m use no_alloc to get memory requirements for model load
1019aadc
0cc4m only set model memory_mb if not previously calculated
4f2efcbb
0cc4m use memory margin instead of total size limit, apply to each device s…
2c733776
0cc4m add server memory debug logging
6ecddda7
0cc4m move llama_context_device_memory function to llama-ext.h
626fd177
0cc4m fix model count exceeded check
5b2ab4ed
0cc4m improve memory_per_device map naming
9c6e4218
0cc4m improve variable naming, fix style
719ab427
0cc4m also strip models memory margin from child processes
f24b9d53
ggerganov cont : clean-up
0ade9b8c
0cc4m replace device memory map with buft memory map. Use llama_get_memory_…
c5847772
0cc4m extract duplicated check into helper function
bd39f024
0cc4m move model memory estimation to subprocess
c03917ad
0cc4m precompute name->buft map, map GPU host types to CPU buft
1201397c
0cc4m cleanup unused variable
db5d5b3b
0cc4m remove duplicated init calls
d1b5a682
0cc4m 0cc4m force pushed from 37a767f5 to d1b5a682 15 hours ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
Labels
Milestone