Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
vllm-project/vllm
Pull Requests
Commits
Open
Closed
AMD CI Test - unskip moe_sum test
rocm
#32039 opened 2026-01-09 17:32 by
hongxiayang
[CPU][BugFix] Disable AOT Compile for CPU
#32037 opened 2026-01-09 17:19 by
fadara01
[responsesAPI] add unit test for optional function tool call id
#32036 opened 2026-01-09 17:06 by
qandrew
[Docs] Add docs about OOT Quantization Plugins
documentation
#32035 opened 2026-01-09 16:48 by
mgoin
[Refactor] Remove numpy split in async scheduling
ready
v1
#32034 opened 2026-01-09 16:39 by
yewentao256
[CI/Build] Publish CPU images for each release
ci/build
#32032 opened 2026-01-09 15:24 by
nathan-weinberg
[NIXL][Bugfix] Failure logging overhaul + early metadata free on failure
v1
kv-connector
#32031 opened 2026-01-09 15:12 by
NickLucche
Add: acceptance length tests
speculative-decoding
v1
#32030 opened 2026-01-09 14:44 by
rahul-tuli
[Refactor] Separate sequence and token pooling types
documentation
new-model
frontend
ready
qwen
#32026 opened 2026-01-09 11:57 by
DarkLight1337
[Frontend] Try multiple prompts in benchmark initial test run
performance
#32024 opened 2026-01-09 10:31 by
MatteoFari
[Bugfix] fix memory inconsistency in cross-process shared memory
#32022 opened 2026-01-09 10:18 by
slippersss
[WIP] Optimize greedy sample.
tpu
v1
#32021 opened 2026-01-09 10:09 by
whx-sjtu
[LoRA][Perf] Improve FusedMoE LoRA performance for small rank
#32019 opened 2026-01-09 09:52 by
xyang16
[Frontend] `finish_reason` must be `tool_call` whenever a tool is called
frontend
#32018 opened 2026-01-09 08:55 by
sanghoon-yn
[MLA] Support DCP + FP8
v1
#32014 opened 2026-01-09 05:19 by
LucasWilkinson
[Fix] Qwen3-VL-MoE bitsandbytes 4 bit quant
qwen
#32013 opened 2026-01-09 05:09 by
Datta0
[MISC] Add strict contiguity check for FlashInfer attention tensors
ready
v1
nvidia
#32008 opened 2026-01-09 02:21 by
vadiklyutiy
Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras.
v1
nvidia
#32005 opened 2026-01-09 01:41 by
yugong333
fused_moe_kernel - cast accumulator after applying router weights
#32002 opened 2026-01-09 00:54 by
gnovack
[fix] add cutedsl to global sf
ready
nvidia
#32001 opened 2026-01-09 00:39 by
jiahanc
[Misc] Enable async scheduling by default with spec decoding
ready
#31998 opened 2026-01-08 23:31 by
njhill
[CI/Build][Hardware][AMD] Fix test_forward_error
rocm
v1
#31997 opened 2026-01-08 23:03 by
rjrock
[MoE Refactor] Move `select_experts` from `FusedMoEQuantMethod` -> `FusedMoE`
needs-rebase
#31996 opened 2026-01-08 22:10 by
bnellnm
[ROCM] Add ROCm image build to release pipeline
rocm
ci/build
#31995 opened 2026-01-08 21:56 by
dllehr-amd
[Misc][PD] Fix `get_attn_backend` usage in transfer connectors
ready
v1
kv-connector
#31988 opened 2026-01-08 18:20 by
NickLucche
[Feature][#29390]: Add timeout support to MultiprocExecutor.collective_rpc and FutureWrapper
v1
#31986 opened 2026-01-08 17:31 by
SandishKumarHN
[Kernel] Optimize Sliding Window Attention in 3D Triton Kernel
#31984 opened 2026-01-08 17:18 by
jvlunteren
[Bugfix] Fix Fp8 Triton for non-gated MoE (Nemotron)
#31983 opened 2026-01-08 17:09 by
danisereb
[Misc] Clean up world_size > avail_gpu warning for ray
v1
#31981 opened 2026-01-08 17:05 by
ruisearch42
Add mergify label job for "bug" match
ci/build
#31980 opened 2026-01-08 16:59 by
mgoin
Older