Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
vllm-project/vllm
Pull Requests
Commits
Open
Closed
Revert "[Bugfix] Fix RMSNorm kernels to multiply in weight's native dtype" (#42379)
bug
documentation
ready
verified
#44088 opened 2026-05-31 02:06 by
vllm-agent
fix(spec_decode): ensure draft model uses correct parallel/model config in all proposers
speculative-decoding
v1
#44086 opened 2026-05-30 23:52 by
anmolxlight
[Bugfix] Quark: set moe_quant_config in QuarkW8A8Int8MoEMethod
bug
#44085 opened 2026-05-30 23:51 by
jashwanth-12
fix: bind structured-output grammar on Responses API when reasoning parser is active
frontend
#44084 opened 2026-05-30 23:38 by
anmolxlight
[Core] Avoid scheduler `RoutedExpertsManager` import unless needed
ready
v1
#44083 opened 2026-05-30 23:25 by
njhill
[Bugfix] Cache the EAGLE/MTP lookahead block in the SWA prefix-cache mask
bug
v1
kv-connector
#44082 opened 2026-05-30 22:48 by
ivanium
[Perf] Gate AR+RMSNorm fusion cap on the active fast all-reduce threshold
#44080 opened 2026-05-30 21:41 by
A1c0r-Z
[MRV2] Remove Eagle's dedicated CUDA graph pool
ready
v1
nvidia
#44078 opened 2026-05-30 21:15 by
LucasWilkinson
refactor(envs): migrate vllm/envs.py to pydantic-settings
structured-output
intel-gpu
v1
#44077 opened 2026-05-30 20:10 by
AnshManwani
Bump the minor-update group across 1 directory with 148 updates
rocm
ci/build
nvidia
dependencies
#44076 opened 2026-05-30 20:07 by
dependabot[bot]
[ROCm][Perf] Fused MoE W4A16 HIP kernel for AMD RDNA3 (gfx1100)
rocm
ready
ci/build
verified
#44075 opened 2026-05-30 19:29 by
JartX
[Core] Pluggable sleep-mode backend abstraction (RFC #34303)
v1
#44074 opened 2026-05-30 18:52 by
matteso1
[Bugfix] Add FlashInfer B12x to linear backend selection
bug
#44073 opened 2026-05-30 18:34 by
mmangkad
[Bugfix] Expand packed module names in GPTQ modules_in_block_to_quantize
bug
#44072 opened 2026-05-30 18:17 by
ben7am1n
Add env var for FlashInfer autotune cache
documentation
#44071 opened 2026-05-30 18:05 by
mmangkad
fix(config): reject negative max_logprobs (except -1) and long_prefill_token_threshold
#44070 opened 2026-05-30 17:33 by
hclsys
[MRV2][DSV4] Reset FlashMLA tile metadata in prefill capture
ready
v1
nvidia
#44069 opened 2026-05-30 16:53 by
WoosukKwon
fix: skip stale LRU cache order entries
#44068 opened 2026-05-30 15:46 by
he-yufeng
docs: fix tokenizer optimization typo
documentation
#44066 opened 2026-05-30 14:27 by
chunyang-wen
[FlashAttention] Sync FA with upstream
ready
ci/build
#44065 opened 2026-05-30 12:37 by
MatthewBonanni
[KV Connector] Add KDA 4-state support to NixlConnector for Kimi Linear PD disaggregation
v1
kv-connector
#44064 opened 2026-05-30 12:11 by
JaredforReal
Int4 kivi kv cache
v1
nvidia
#44059 opened 2026-05-30 09:06 by
a1exxd0
[Bugfix][Tool Parsers] Validate JSON arguments in extract_tool_calls for Kimi K2, DeepSeek V3/V3.1
bug
tool-calling
deepseek
#44058 opened 2026-05-30 08:59 by
rahulsolanki001
[Bugfix] Reject non-positive values for ParallelConfig int knobs
bug
#44057 opened 2026-05-30 08:31 by
jwzheng96
fix(ngram): match async ngram_gpu acceptance rate to CPU
speculative-decoding
v1
#44056 opened 2026-05-30 07:49 by
shiyangyang2001-lgtm
docs(nixl): document KV Transfer metrics log fields and Prometheus counters
documentation
kv-connector
#44055 opened 2026-05-30 07:47 by
sridhar-3009
fix(ngram): sync ngram_gpu acceptance rate to match CPU
v1
#44054 opened 2026-05-30 07:44 by
shiyangyang2001-lgtm
[Bugfix][V1][TurboQuant] Reserve workspace before CUDA graph capture
bug
v1
nvidia
#44053 opened 2026-05-30 06:38 by
Bot1822
Add speculative decoding metrics
frontend
v1
#44052 opened 2026-05-30 05:32 by
naomili0924
[CI] Stabilize the multi-audio OpenAI server path
frontend
multi-modality
#44051 opened 2026-05-30 05:06 by
AndreasKaratzas
Newer
Older