vllm
38c00afb - [Feat][NIXL] Add KV lease refresh mechanism for disaggregated prefill

Commit
61 days ago
[Feat][NIXL] Add KV lease refresh mechanism for disaggregated prefill Replace the static P-side KV block timeout with an active lease mechanism. D workers periodically POST /internal/nixl/lease_refresh to extend the hold window while requests sit in the D queue, preventing premature block expiry on bursty workloads without requiring a large static timeout. - D scheduler tracks pending remote-prefill requests in `_requires_lease_dict`; a background thread POSTs refreshes every `timeout // 3` seconds - P scheduler receives refreshes via new EngineCore utility method `nixl_lease_refresh`, stores updated expiry in `_lease_refreshes`, and passes them to the P worker through `NixlConnectorMetadata` - P worker applies refreshes in `start_load_kv` and does a full scan (not early-break) in `get_finished` since expiry order may change - New `VLLM_NIXL_HTTP_PORT` env var (default 8000) lets D locate P's HTTP server; P includes it in `kv_transfer_params` - New FastAPI route registered unconditionally in `build_app`; no-op on non-NIXL instances Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Robert Shaw <robshaw@redhat.com>
Author
Robert Shaw
Parents
Loading