Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
vllm-project/vllm
Pull Requests
Commits
Open
Closed
[Metrics] Clamp prefill KV computed token metric
v1
#42256 opened 2026-05-11 00:17 by
sladyn98
Add tool parser for Nvidia Nemotron family models
documentation
tool-calling
nvidia
#42255 opened 2026-05-11 00:14 by
sniper35
The Best of Times
v1
nvidia
#42254 opened 2026-05-10 23:59 by
vamsiDT
docs: add Agent Friendly badge
documentation
#42253 opened 2026-05-10 23:47 by
AbhiOnGithub
[MoE Refactor] Mk construct
needs-rebase
cpu
gpt-oss
nvidia
#42252 opened 2026-05-10 23:27 by
bnellnm
[Perf] Auto-compile trivial CustomOp fallbacks to complete GemmaRMSNorm fusion under enforce_eager
#42251 opened 2026-05-10 22:38 by
Ray-RP
[Bugfix][Model] Gemma4 MoE routing closure captures per_expert_scale, breaking functional_call substitution
bug
#42250 opened 2026-05-10 21:47 by
NoeliaBentancor
Add recompute partial rollout sleep mode
documentation
frontend
v1
cpu
#42249 opened 2026-05-10 21:26 by
LYMDLUT
[ROCm] Avoid full KV cache dequant in MLA decode fallback
rocm
v1
#42248 opened 2026-05-10 19:40 by
Bortlesboat
[ROCm] Normalize fp8 scales through float32
rocm
#42247 opened 2026-05-10 19:03 by
Bortlesboat
Fix EXAONE-4.5 to align with Transformers update
#42246 opened 2026-05-10 18:00 by
lkm2835
[Bugfix] Fix prompt_logprobs non-determinism with prefix caching (issue #42019)
bug
v1
#42245 opened 2026-05-10 17:35 by
factnn
Avoid silent weights corruption when loading Nemotron Nano VL with reusable-buffer loaders like runai distributed streaming
multi-modality
verified
#42244 opened 2026-05-10 17:34 by
noa-neria
[Performance] Build logprobs context IDs incrementally
v1
#42243 opened 2026-05-10 17:19 by
lokashrinav
[LoRA] Support 2D and 3D MoE LoRA adapter at the same
documentation
frontend
qwen
#42242 opened 2026-05-10 17:11 by
jeejeelee
[Perf] [Qwen3.5] Reenable torch compile for `rearrange_mixed_qkv` in GDN linear attention
qwen
#42241 opened 2026-05-10 17:08 by
tjtanaa
[Bugfix][ROCm] Force splitK=0 in AiterFp8BlockScaledMMKernel for determinism
bug
rocm
#42240 opened 2026-05-10 16:36 by
maeehart
[CI][Bugfix] De-flake Fusion E2E TP2 test
bug
v1
#42238 opened 2026-05-10 15:36 by
haosdent
[Bugfix] Rewrite Gemma4 streaming tool parser
bug
tool-calling
#42237 opened 2026-05-10 15:32 by
whytem
[DSv4] Improved dequant gather K cache kernel
v1
DSv4
#42236 opened 2026-05-10 15:29 by
gau-nernst
[Kernel][Performance] Add FlashInfer cutedsl NVFP4 GEMM backend
nvidia
#42235 opened 2026-05-10 15:27 by
mmangkad
[Bugfix] Fix scipy audio resampling ratio
bug
multi-modality
#42233 opened 2026-05-10 15:17 by
BWAAEEEK
[Bugfix][GGUF] Fix deepseek2 architecture not supported when loading without config.json
bug
deepseek
#42232 opened 2026-05-10 15:10 by
AjAyrAo43
[Distributed] SymmMem fused allreduce+RMSNorm: portable Triton kernel
performance
#42230 opened 2026-05-10 14:09 by
LeoYangXY
[Model] Add Mistral3 MM LoRA token helper methods
multi-modality
mistral
#42228 opened 2026-05-10 13:04 by
coldpark
[Bugfix] Fix queue cleanup deadlock in multi-turn benchmark
bug
performance
#42227 opened 2026-05-10 12:09 by
idan-friedman
[CPU] Fix rotary embedding for CPU without flash-attn ops
#42225 opened 2026-05-10 12:01 by
jmamou
[MM][CG] Enable encoder Cudagraph for Step3VL
documentation
v1
multi-modality
nvidia
#42224 opened 2026-05-10 11:12 by
JisoLya
[CI] Narrow basic_correctness.yaml source dependencies
ci/build
#42223 opened 2026-05-10 10:31 by
khluu
[CI] Narrow models_language.yaml source dependencies
ci/build
#42222 opened 2026-05-10 10:30 by
khluu
Older