Commits vllm-project/vllm

[Frontend] Add multi-server frontend for K8s pod health aggregation

Robert Shaw committed 27 days ago

e0f7ae54

[Model] Add NVFP4 quantization support for Step3.5-Flash (#34478)

tacos8me committed 28 days ago

Verified b7892a3b

[Bug] Refactor max_num_batched_tokens to account for drafting (#34898)

benchislett committed 28 days ago

Verified 682566b1

[Spec Decode] Defer clearing KV connector metadata for EAGLE3 speculative decode + prefill / decode disagg setup (#34529)

zixi-qi committed 28 days ago

Verified b9c2a565

[ROCm][CI] Fix realtime test timeouts caused by aiter JIT compilation delays (#35052)

AndreasKaratzas committed 28 days ago

Verified dd8c3a7f

[ROCm][CI] Fix flaky embedding chat test by using tolerance-based comparison (#35050)

AndreasKaratzas committed 28 days ago

Verified a8a47c17

[Bugfix] Fix Qwen3/Qwen3.5 Reasoning Parser (#34779)

ywang96 committed 28 days ago

Verified 40f88d83

[Model Runner V2] Enable CUDA graph for Eagle3 (#35040)

WoosukKwon committed 28 days ago

Verified 2cbf9656

Fix apply_top_k_top_p_triton called by non-cuda logits Tensor (#35030)

xli committed 28 days ago

Verified 30132cd1

[Benchmark] Use `sns.relplot` for plotting (#35027)

DarkLight1337 committed 28 days ago

Verified cbd95a2d

[New Model] Add ColModernVBERT (#34558)

athrael-soju committed 28 days ago

Verified 970861ac

[CI] Bump mteb version to `mteb[bm25s]>=2, <3` for pooling model unit tests (#34961)

yewentao256 committed 28 days ago

Verified d24bdd7c

[CI] Stabilizing ROCm amd-ci signal and minor name fix in upstream (#35008)

AndreasKaratzas committed 28 days ago

Verified d403c1da

[Model Runner V2] Support attention group (#35036)

WoosukKwon committed 28 days ago

Verified b71fbd06

[Model Bash][DSR1] Add selective dynamic shape marking for CustomOp (#34900)

vadiklyutiy committed 28 days ago

Verified 74d90b1c

[Model Runner V2] Support Eagle3 (no CUDA graph) (#35029)

WoosukKwon committed 28 days ago

Verified a4047d4e

[CI/Build] Fix gRPC version mismatch (#35013)

DarkLight1337 committed 29 days ago

Verified 965fe459

[Frontend] Add automatic language detection for Whisper transcription (#34342)

spacecheck committed 29 days ago

Verified 98b0205c

[Bugfix] Gate 256-bit instructions to CUDA 12.9+ (#34791)

huydhn committed 29 days ago

Verified 272b535a

[Benchmark] Improve benchmarks (#35012)

DarkLight1337 committed 29 days ago

Verified f74f1572

[Doc] Fix example of eagle3 (#34960)

petrpechman committed 29 days ago

Verified bebfe55b

[Core] Minor structured-output related scheduler optimization (#34765)

njhill committed 29 days ago

Verified 820d7815

[PD] Change kv_load_failure_policy Default from "recompute" to "fail" (#34896)

NickLucche committed 29 days ago

Verified ab6f3487

[ROCm] Enable bitsandbytes quantization support on ROCm (#34688)

Abdennacer-Badaoui committed 29 days ago

Verified 8dc8a99b

[ROCM] Optimize ROCM_AITER_FA spec decode eagle performance (#34541)

jennyyyyzhen committed 29 days ago

Verified 2aab2bb5

[ROCm][CI] Fix spec decode logprobs flakiness and parametrize tree attention backends (#34599)

AndreasKaratzas committed 29 days ago

Verified 54254f7a

[ROCm][AITER] Fix aiter paged_attention_v1 decode for sliding window and head_size < 64 (#34570)

AndreasKaratzas committed 29 days ago

Verified cf93c1a1

[CI] Fix ColBERT HF comparison tests on AMD CI + refactor (#34567)

AndreasKaratzas committed 29 days ago

Verified 89358f0d

[feat] Add per-block extra_keys to KV events (#33304)

zhongdaor-nv committed 29 days ago

Verified a0fe7ea2

[CI][MCP][Harmony] Heavy refactoring Harmony & MCP response tests and stabilizing with deterministic test infrastructure (#33949)

AndreasKaratzas committed 29 days ago

Verified 991d6bff