server: add --models-memory-max parameter to allow dynamically unloading models when they exceed a memory size threshold
estimate with to-be-loaded model size included
use no_alloc to get memory requirements for model load
only set model memory_mb if not previously calculated
use memory margin instead of total size limit, apply to each device separately
add server memory debug logging
move llama_context_device_memory function to llama-ext.h
fix model count exceeded check
improve memory_per_device map naming
improve variable naming, fix style
also strip models memory margin from child processes
cont : clean-up
replace device memory map with buft memory map. Use llama_get_memory_breakdown
extract duplicated check into helper function
move model memory estimation to subprocess
precompute name->buft map, map GPU host types to CPU buft
cleanup unused variable
remove duplicated init calls