Pull Requests vllm-project/vllm

Reapply [Attention][FA3] Update FA3 to include new swizzle optimization ready ci/build v1

#34043 opened 2026-02-07 06:03 by LucasWilkinson

[ROCm][AITER] Add fused RoPE+KVCache pass with MultiOutputPattern fix rocm needs-rebase v1 gpt-oss

#34037 opened 2026-02-07 03:48 by spaparaju

[BugFix] Fix mm_encoder_only init for qwen3 vl moe model bug qwen

#34033 opened 2026-02-07 03:04 by shepark

[ROCm] update triton branch to support gpt-oss models for gfx11xx devices rocm ci/build gpt-oss

#34032 opened 2026-02-07 02:00 by hongxiayang

[CI][torch.compile] Fix incorrect filtering for E2E fusion tests on B200 ready ci/build

#34031 opened 2026-02-07 01:56 by ProExpertProg

[Bugfix] Add reasoning_content backward compat to DeltaMessage for streaming bug frontend

#34030 opened 2026-02-07 00:58 by cradonn

[Perf] Optimize async scheduling redundant copy, 0.9% E2E throughput improvement ready v1

#34029 opened 2026-02-07 00:35 by yewentao256

[bug-fix] supported_tasks is breaking backward compatibility at init_app_state bug frontend ready

#34027 opened 2026-02-06 23:26 by kouroshHakha

add --insecure arg to the vllm bench to skip TLS performance

#34026 opened 2026-02-06 23:14 by fanyang-real

[Kernel] [Helion] [5/N] Add Helion Autotuning infrastructure

#34025 opened 2026-02-06 23:08 by gmagogsfm

[Core] Add Helix (Context + Tensor) Parallelism documentation v1 llama nvidia

#34024 opened 2026-02-06 22:44 by sungsooha

[Bugfix] Fix RAW hazard and optimize stores in EP Scatter Kernel bug

#34023 opened 2026-02-06 22:17 by Manikvsin

[Misc][Spec Decode] support different load config for draft model speculative-decoding v1

#34022 opened 2026-02-06 22:00 by ZhengkaiZ

[Bugfix] Fix Worker.load_model context-manager composition for sleep mode bug ready v1

#34021 opened 2026-02-06 21:40 by tianshu-Michael-yu

[wip] layerwise loading for fp8.py, take 2

#34020 opened 2026-02-06 21:23 by vkuzo

[Quantization][Refactor] Clean up GPTQ + AWQ quantization

#34019 opened 2026-02-06 20:51 by mu-hashmi

Threshold fix wvSplitk for occasional CI fails rocm

#34013 opened 2026-02-06 19:12 by amd-hhashemi

[Bugfix] Fix DP Attention Padding in Dummy Run bug ready v1

#34009 opened 2026-02-06 18:16 by benchislett

[Hybrid] Enable mamba prefix cache "align" mode with async scheduling v1

#33997 opened 2026-02-06 14:44 by tdoublep

Bump `lm-eval` version for Transformers v5 compatibility documentation rocm ready needs-rebase ci/build

#33994 opened 2026-02-06 13:53 by hmellor

[Bugfix] Fix CUDA compatibility path setting for both datacenter and consumer NVIDIA GPUs bug documentation ci/build nvidia

#33992 opened 2026-02-06 12:35 by ehfd

Pass modality information in embed_multimodal speculative-decoding v1 qwen

#33990 opened 2026-02-06 11:31 by reaganjlee

[Bugfix][Frontend] Fix IndexError in Mistral tool parser during streaming tool calls bug frontend

#33988 opened 2026-02-06 11:13 by veeceey

[Kernel] FlashInfer: switch allreduce fusion to unified API performance

#33985 opened 2026-02-06 10:06 by mmangkad

[CPU][PPC64] Fix bf16 path in mla_decode.cpp cpu

#33983 opened 2026-02-06 09:32 by Akashcodes732

fix: reject non-text content in system/developer messages frontend

#33981 opened 2026-02-06 09:21 by veeceey

[Frontend] Add --disable-log-prefix flag and VLLM_DISABLE_LOG_PREFIX env var frontend v1

#33979 opened 2026-02-06 08:53 by veeceey

Scale input before applying Marlin operator

#33972 opened 2026-02-06 07:21 by ir1ka

[Frontend] Add --disable-uvicorn-metrics-access-log shorthand flag documentation frontend

#33969 opened 2026-02-06 05:56 by veeceey

[Bugfix] Fix Qwen3-Coder tool call streaming for duplicate names and param parsing bug qwen

#33965 opened 2026-02-06 04:51 by alexbi29