Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
vllm-project/vllm
Pull Requests
Commits
Open
Closed
Reapply [Attention][FA3] Update FA3 to include new swizzle optimization
ready
ci/build
v1
#34043 opened 2026-02-07 06:03 by
LucasWilkinson
[ROCm][AITER] Add fused RoPE+KVCache pass with MultiOutputPattern fix
rocm
needs-rebase
v1
gpt-oss
#34037 opened 2026-02-07 03:48 by
spaparaju
[BugFix] Fix mm_encoder_only init for qwen3 vl moe model
bug
qwen
#34033 opened 2026-02-07 03:04 by
shepark
[ROCm] update triton branch to support gpt-oss models for gfx11xx devices
rocm
ci/build
gpt-oss
#34032 opened 2026-02-07 02:00 by
hongxiayang
[CI][torch.compile] Fix incorrect filtering for E2E fusion tests on B200
ready
ci/build
#34031 opened 2026-02-07 01:56 by
ProExpertProg
[Bugfix] Add reasoning_content backward compat to DeltaMessage for streaming
bug
frontend
#34030 opened 2026-02-07 00:58 by
cradonn
[Perf] Optimize async scheduling redundant copy, 0.9% E2E throughput improvement
ready
v1
#34029 opened 2026-02-07 00:35 by
yewentao256
[bug-fix] supported_tasks is breaking backward compatibility at init_app_state
bug
frontend
ready
#34027 opened 2026-02-06 23:26 by
kouroshHakha
add --insecure arg to the vllm bench to skip TLS
performance
#34026 opened 2026-02-06 23:14 by
fanyang-real
[Kernel] [Helion] [5/N] Add Helion Autotuning infrastructure
#34025 opened 2026-02-06 23:08 by
gmagogsfm
[Core] Add Helix (Context + Tensor) Parallelism
documentation
v1
llama
nvidia
#34024 opened 2026-02-06 22:44 by
sungsooha
[Bugfix] Fix RAW hazard and optimize stores in EP Scatter Kernel
bug
#34023 opened 2026-02-06 22:17 by
Manikvsin
[Misc][Spec Decode] support different load config for draft model
speculative-decoding
v1
#34022 opened 2026-02-06 22:00 by
ZhengkaiZ
[Bugfix] Fix Worker.load_model context-manager composition for sleep mode
bug
ready
v1
#34021 opened 2026-02-06 21:40 by
tianshu-Michael-yu
[wip] layerwise loading for fp8.py, take 2
#34020 opened 2026-02-06 21:23 by
vkuzo
[Quantization][Refactor] Clean up GPTQ + AWQ quantization
#34019 opened 2026-02-06 20:51 by
mu-hashmi
Threshold fix wvSplitk for occasional CI fails
rocm
#34013 opened 2026-02-06 19:12 by
amd-hhashemi
[Bugfix] Fix DP Attention Padding in Dummy Run
bug
ready
v1
#34009 opened 2026-02-06 18:16 by
benchislett
[Hybrid] Enable mamba prefix cache "align" mode with async scheduling
v1
#33997 opened 2026-02-06 14:44 by
tdoublep
Bump `lm-eval` version for Transformers v5 compatibility
documentation
rocm
ready
needs-rebase
ci/build
#33994 opened 2026-02-06 13:53 by
hmellor
[Bugfix] Fix CUDA compatibility path setting for both datacenter and consumer NVIDIA GPUs
bug
documentation
ci/build
nvidia
#33992 opened 2026-02-06 12:35 by
ehfd
Pass modality information in embed_multimodal
speculative-decoding
v1
qwen
#33990 opened 2026-02-06 11:31 by
reaganjlee
[Bugfix][Frontend] Fix IndexError in Mistral tool parser during streaming tool calls
bug
frontend
#33988 opened 2026-02-06 11:13 by
veeceey
[Kernel] FlashInfer: switch allreduce fusion to unified API
performance
#33985 opened 2026-02-06 10:06 by
mmangkad
[CPU][PPC64] Fix bf16 path in mla_decode.cpp
cpu
#33983 opened 2026-02-06 09:32 by
Akashcodes732
fix: reject non-text content in system/developer messages
frontend
#33981 opened 2026-02-06 09:21 by
veeceey
[Frontend] Add --disable-log-prefix flag and VLLM_DISABLE_LOG_PREFIX env var
frontend
v1
#33979 opened 2026-02-06 08:53 by
veeceey
Scale input before applying Marlin operator
#33972 opened 2026-02-06 07:21 by
ir1ka
[Frontend] Add --disable-uvicorn-metrics-access-log shorthand flag
documentation
frontend
#33969 opened 2026-02-06 05:56 by
veeceey
[Bugfix] Fix Qwen3-Coder tool call streaming for duplicate names and param parsing
bug
qwen
#33965 opened 2026-02-06 04:51 by
alexbi29
Newer
Older