Pull Requests vllm-project/vllm

[ROCM][CI] Fix AMD Examples Test Group documentation rocm ci/build

#30276 opened 2025-12-08 19:35 by Concurrensee

[NIXL] refine decoder side post process for heterogeneous BlockSize and kv_layout v1 kv-connector

#30275 opened 2025-12-08 19:34 by xuechendi

[AMD] Amd/deepseek aiter fusions rocm needs-rebase v1 deepseek

#30274 opened 2025-12-08 19:33 by k50112113

[Bugfix] Temporarily disable group quant rms norm fusion

#30273 opened 2025-12-08 18:34 by ElizaWszola

[CI/Build] Use spawn subprocess for ROCm documentation rocm

#30272 opened 2025-12-08 18:16 by rjrock

[ROCm][CI][Bugfix] Multi-Modal Model Support Fixes and Attention Backend Improvements rocm ci/build multi-modality qwen

#30270 opened 2025-12-08 17:00 by AndreasKaratzas

[Bugfix] Fix DeepGEMM after #29546 ready

#30267 opened 2025-12-08 16:06 by zhewenl

[Frontend] Fixes anthropic streaming message_start usage nesting frontend ready

#30266 opened 2025-12-08 15:58 by bbartels

Fix 500 /tokenize errors and 400 v1/chat/completions errors when using truncate_prompt_tokens and sending /tokenize and v1/chat/completions requests under high concurrency frontend

#30264 opened 2025-12-08 14:41 by Soufiane-Ra

Multiple Hybrid KV Cache Coordinator v1

#30263 opened 2025-12-08 14:09 by roikoren755

Support TP which is not divded for NVFP4 kernels (flashinfer-cutlass) by adding dynamic padding nvidia

#30260 opened 2025-12-08 13:25 by danielafrimi

[Feature]: OpenTelemetry Metrics Support v1

#30258 opened 2025-12-08 11:45 by mladjan-gadzic

[bugfix][quantization] Fix fp8 per_tensor scale shape rocm ready v1

#30257 opened 2025-12-08 11:37 by haoyangli-amd

[ROCm] Use aiter.topk_sigmoid in llama4 rocm llama

#30255 opened 2025-12-08 11:07 by tpopp

gptq marlin quantization support for fused moe with lora

#30254 opened 2025-12-08 10:33 by Bhanu068

fix: DeepSeek-V3.2 DeepGEMM RuntimeError deepseek

#30251 opened 2025-12-08 09:38 by KeeProMise

[gpt-oss] Add model_identity to system message retrieval for harmony chat template frontend gpt-oss

#30247 opened 2025-12-08 08:43 by lyuwen

[Bugfix] Fix fusion for VL models

#30244 opened 2025-12-08 07:47 by ElizaWszola

[Feature] skip language model in Encoder qwen

#30242 opened 2025-12-08 07:09 by Bounty-hunter

[bug] Fix "Current vLLM config is not set." warnings when FlashInfer attention is used v1 nvidia

#30241 opened 2025-12-08 06:30 by nvpohanh

[Bugfix] fix streaming final output for non harmony frontend gpt-oss

#30237 opened 2025-12-08 05:58 by penfree

Bump actions/stale from 10.1.0 to 10.1.1 dependencies ci/build github_actions

#30234 opened 2025-12-08 04:39 by dependabot[bot]

Bump actions/checkout from 6.0.0 to 6.0.1 dependencies ci/build github_actions

#30233 opened 2025-12-08 04:39 by dependabot[bot]

[responsesAPI][6] Fix multi turn MCP tokenization documentation frontend gpt-oss

#30230 opened 2025-12-08 03:08 by qandrew

Fix scheduler yield on arm

#30228 opened 2025-12-08 02:59 by wangxiyuan

[Misc] Pass kwargs to get attn_backend_cls

#30226 opened 2025-12-08 02:41 by Potabk

[Platform] Let EPD work with non-cuda platform nvidia

#30225 opened 2025-12-08 02:22 by wangxiyuan

[Cleanup] Remove unused ModelRunner V1 `InputBatch.num_tokens` field tpu ready v1

#30218 opened 2025-12-07 20:10 by njhill

[LMCache] Fix breakage due to new LMCache version ready ci/build kv-connector

#30216 opened 2025-12-07 18:13 by njhill

[Feature] Auto-calculate num_redundant_experts for EPLB (#30075) documentation

#30215 opened 2025-12-07 17:45 by parlakisik