[Frontend] Add multi-server frontend for K8s pod health aggregation
When running N vLLM API servers inside a single Kubernetes pod, a
shared SO_REUSEPORT setup means K8s health probes only reach one server.
If any backend crashes the pod can remain partially live.
This PR adds --multi-server-frontend: a lightweight FastAPI process that
runs on the main port (K8s-facing) and:
1. Aggregates /health across all N backends — returns 200 only when
every backend is healthy, so liveness/startup probes work correctly.
2. Monitors backend processes and exits with code 1 if any crash,
triggering a K8s pod restart instead of leaving a degraded pod.
Port layout:
--port → frontend (K8s-facing)
--port+1..+N → vLLM backend servers (pod-internal)
New files:
vllm/entrypoints/openai/frontend.py
Modified:
vllm/entrypoints/openai/cli_args.py (add --multi-server-frontend flag)
vllm/entrypoints/cli/serve.py (add run_multi_api_server_with_frontend)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>