Pull Requests vllm-project/vllm

[BugFix] Fix minimax m2 model rotary_dim ready

#30384 opened 2025-12-10 10:37 by rogeryoungh

adding constraint updates of cos-sin to improve mrope performance

#30377 opened 2025-12-10 08:48 by wujinyuan1

[Fix]fix import error from lmcache kv-connector

#30376 opened 2025-12-10 08:38 by wz1qqx

Implement LMDB-based multi-modal cache ci/build v1 multi-modality

#30373 opened 2025-12-10 07:21 by petersalas

[Fix] Add default rope theta for qwen1 model qwen

#30369 opened 2025-12-10 02:36 by iwzbi

[Bug Fix] Fix Kimi-Linear model initialization crash due to missing 'indexer_rotary_emb' arg

#30366 opened 2025-12-10 00:02 by yonasTMC

fix(gguf): Auto-select compatible dtype for GGUF models on Blackwell

#30365 opened 2025-12-09 23:59 by kitaekatt

[Bugfix] awq_gemm: fix argument order swap

#30364 opened 2025-12-09 23:17 by mgehre-amd

Remove all2all backend envvar documentation ci/build

#30363 opened 2025-12-09 23:09 by elizabetht

[WIP] Bump dockerfile to cuda 13.0.2 (for testing) ci/build nvidia

#30362 opened 2025-12-09 22:51 by dougbtv

[Attention][AMD] Make flash-attn optional rocm speculative-decoding v1

#30361 opened 2025-12-09 22:46 by mgehre-amd

Upstream fp8 with static scales gpt oss needs-rebase gpt-oss

#30357 opened 2025-12-09 19:49 by maleksan85

[CI][DeepSeek] Add nightly DeepSeek R1 `lm_eval` tests on H200 ready ci/build deepseek

#30356 opened 2025-12-09 18:05 by MatthewBonanni

[Fix] Handle multiple tool calls in Qwen3-MTP tool parser frontend tool-calling qwen

#30353 opened 2025-12-09 17:48 by ArkVex

Remove virtual engine handling tpu needs-rebase v1 codex qwen kv-connector

#30350 opened 2025-12-09 16:34 by WoosukKwon

[Docs]: adds a new metric vllm:request_prefill_kv_computed_tokens in docs documentation

#30348 opened 2025-12-09 15:24 by googs1025

[Core] Major fix catch backend grammar exceptions (xgrammar, outlines, etc) in scheduler v1

#30346 opened 2025-12-09 14:58 by blancsw

[Bugfix] Fix HunyuanOCR cross-image contamination in batch processing

#30344 opened 2025-12-09 14:49 by anker-c2

[CI] refine more logic when generating and using nightly wheels & indices ci/build

#30341 opened 2025-12-09 14:17 by Harry-Chen

Add Eagle and Eagle3 support to Transformers modeling backend

#30340 opened 2025-12-09 14:09 by hmellor

Fix gigachat3 parser + update tests frontend tool-calling

#30338 opened 2025-12-09 13:37 by ajpqs

fix: enhance human_readable_int function

#30337 opened 2025-12-09 13:28 by andyxning

[Bugfix]: Streaming i/o of batch files. Resolves #30268 frontend ci/build

#30334 opened 2025-12-09 11:32 by umgefahren

[Bugfix] tpu_model_runner: set vllm config context when calling reset_dynamo_cache() tpu ready v1

#30331 opened 2025-12-09 11:08 by dtrifiro

[Feature][CPU Backend]: Add PyTorch vectorized backend

#30329 opened 2025-12-09 10:18 by Radu2k

[Frontend] [Doc] Exclude log deltas feature frontend

#30322 opened 2025-12-09 09:13 by Catacomba

[BugFix] Spec decode with VLLM_ENABLE_V1_MULTIPROCESSING=0 v1

#30319 opened 2025-12-09 08:26 by heheda12345

[Frontend] Allow users to modify the scheduler configuration online in dev mode. frontend v1

#30316 opened 2025-12-09 08:06 by noooop

Generalize pooling model support with multi-task, multi-layer, multi-label classification that can be pooled from both hidden states and LM head's logits.

#30315 opened 2025-12-09 07:56 by kflu

[fix] fix SM check for Flashinfer TRTLLM MOE nvidia

#30314 opened 2025-12-09 07:12 by jiahanc