PR #2976 [Backend] Introduce vLLM backend

backend(vllm): initial commit

mfuntowicz committed 338 days ago

backend(vllm): statically allocate LLMEngine

mfuntowicz committed 337 days ago

backend(vllm): plug in the tokio server and CLI

mfuntowicz committed 336 days ago

backend(vllm): submit new request to vLLM engine

mfuntowicz committed 332 days ago

backend(vllm): remove python print stmt

mfuntowicz committed 332 days ago

backend(vllm): make v1 the default

mfuntowicz committed 330 days ago

backend(vllm): expose FFI for CompletionOutput and RequestOutput on Rust side

mfuntowicz committed 330 days ago

backend(vllm): map ResultOutput to InferStreamResponse to stream back to the client

mfuntowicz committed 330 days ago

backend(vllm): disable metrics for now

mfuntowicz committed 329 days ago

text-generation-inference [Backend] Introduce vLLM backend #2976 Open