[Feat][NIXL] Add KV lease refresh mechanism for disaggregated prefill
Replace the static P-side KV block timeout with an active lease
mechanism. D workers periodically POST /internal/nixl/lease_refresh
to extend the hold window while requests sit in the D queue, preventing
premature block expiry on bursty workloads without requiring a large
static timeout.
- D scheduler tracks pending remote-prefill requests in
`_requires_lease_dict`; a background thread POSTs refreshes every
`timeout // 3` seconds
- P scheduler receives refreshes via new EngineCore utility method
`nixl_lease_refresh`, stores updated expiry in `_lease_refreshes`,
and passes them to the P worker through `NixlConnectorMetadata`
- P worker applies refreshes in `start_load_kv` and does a full scan
(not early-break) in `get_finished` since expiry order may change
- New `VLLM_NIXL_HTTP_PORT` env var (default 8000) lets D locate P's
HTTP server; P includes it in `kv_transfer_params`
- New FastAPI route registered unconditionally in `build_app`; no-op
on non-NIXL instances
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>