text-generation-inference
[Backend] Introduce vLLM backend
#2976
Open

Commits
  • backend(vllm): initial commit
    mfuntowicz committed 338 days ago
  • backend(vllm): statically allocate LLMEngine
    mfuntowicz committed 337 days ago
  • backend(vllm): plug in the tokio server and CLI
    mfuntowicz committed 336 days ago
  • backend(vllm): submit new request to vLLM engine
    mfuntowicz committed 332 days ago
  • backend(vllm): remove python print stmt
    mfuntowicz committed 332 days ago
  • backend(vllm): make v1 the default
    mfuntowicz committed 330 days ago
  • backend(vllm): expose FFI for CompletionOutput and RequestOutput on Rust side
    mfuntowicz committed 330 days ago
  • backend(vllm): map ResultOutput to InferStreamResponse to stream back to the client
    mfuntowicz committed 330 days ago
  • backend(vllm): disable metrics for now
    mfuntowicz committed 329 days ago
Loading