Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
vllm-project/vllm
Pull Requests
Commits
Open
Closed
[Bugfix] Gemma 4: Fix bug around invalid JSON diffs during tool usage
bug
tool-calling
#38945 opened 2026-04-03 21:02 by
Brummi
Re-enable Inductor pre-grad passes in standalone compile (torch>=2.12)
ready
#38944 opened 2026-04-03 20:45 by
frgossen
[BUG] Fix PP for R3
bug
v1
#38943 opened 2026-04-03 20:42 by
hao-aaron
[vLLM IR] Avoid redundant file reads in IrOpImpl.uuid()
vllm-ir
#38940 opened 2026-04-03 19:45 by
sBobHuang
[R3] Add routed experts to openai entrypoint
frontend
#38939 opened 2026-04-03 19:25 by
hao-aaron
Bug/test eagle dp v0
bug
ready
v1
#38938 opened 2026-04-03 19:19 by
Monishver11
[ROCm][CI] Added back missing common deps
rocm
ready
ci/build
#38937 opened 2026-04-03 19:16 by
AndreasKaratzas
[PD][HeteroArch]Fix accuracy issue with CPU_ATTN as Decoder and Flash_ATTN as prefiller
intel-gpu
cpu
kv-connector
#38935 opened 2026-04-03 18:58 by
xuechendi
[Performance Improvement] Update `batched_count_greater_than` to handle batch size 1 without recompile
v1
#38933 opened 2026-04-03 18:56 by
Lucaskabela
[Bugfix][Perf] Indexer upcast WK to BF16 for fusion
bug
deepseek
#38928 opened 2026-04-03 18:06 by
benchislett
[Bugfix][LoRA] Fix missing in_proj_z in Qwen3_5ForConditionalGenerati…
bug
ready
qwen
#38927 opened 2026-04-03 17:59 by
elenalil-aws
[Bugfix] Fix broken explicit unquantized kv cache dtype support
bug
#38922 opened 2026-04-03 16:22 by
Isotr0py
[ROCm][CI] Move skipped tests out of run-amd-test.sh
rocm
ci/build
#38921 opened 2026-04-03 16:16 by
micah-wil
[Docs] add cache directory security guidance
documentation
#38920 opened 2026-04-03 15:44 by
russellb
[Bugfix] Runtime driver check for cuMemcpyBatchAsync in swap_blocks_batch
bug
#38919 opened 2026-04-03 15:18 by
Etelis
[Bugfix] Fix Qwen3.5 LoRA activation for shared expert modules
bug
qwen
#38917 opened 2026-04-03 14:52 by
jayden222
[Bug] Fix compile error for `swap_blocks_batch` in CUDA 13
bug
ready
nvidia
#38915 opened 2026-04-03 13:58 by
yewentao256
[ROCm] mi250x decode regression
rocm
#38914 opened 2026-04-03 13:37 by
rlrs
[NVIDIA] Update FlashInfer to version 0.6.7.post1. Avoid re-downloading BMM export headers when flashinfer-cubin is installed
ready
ci/build
nvidia
ready-run-all-tests
#38913 opened 2026-04-03 11:34 by
johnnynunez
[Bugfix][Frontend] Fix Gemma4 streaming HTML duplication after tool calls
bug
tool-calling
#38909 opened 2026-04-03 11:03 by
yoke233
Fix the order of _free_encoder_inputs
v1
#38907 opened 2026-04-03 10:40 by
gty111
[Core] Make handshake timeout configurable
v1
#38906 opened 2026-04-03 10:10 by
foraxe
refactor hard coded device string in test files under tests/compile tests/quantization tests/models and tests/model_executor
v1
multi-modality
#38901 opened 2026-04-03 09:30 by
wincent8
[P/D][Feature] Support kv_transfer_params for parallel sampling (n>1)
frontend
v1
kv-connector
#38900 opened 2026-04-03 09:24 by
chaunceyjiang
[XPU] [CT] Enable CT W4A4MxFp4 path and add xpu kernel
intel-gpu
#38896 opened 2026-04-03 08:36 by
zufangzhu
bugfix(flashinfer,dcp): remove kv_cache_layout for BatchDCPPrefillWrapper._new_tokens.
bug
v1
nvidia
#38895 opened 2026-04-03 08:35 by
pisceskkk
[Gemma4] Allow per-layer attention backend selection for heterogeneou…
#38891 opened 2026-04-03 07:48 by
CunXin1
[Bugfix] Fix logger.warning format string arg mismatch in Qwen3XML tool parser
bug
tool-calling
qwen
#38890 opened 2026-04-03 07:44 by
edenfunf
[Doc] add Apple Clang 21+ compilation troubleshooting
documentation
cpu
#38889 opened 2026-04-03 07:40 by
Alex-ai-future
[Multimodal] Add attention-score-based image token pruning for Qwen VL models
documentation
frontend
v1
multi-modality
qwen
#38888 opened 2026-04-03 07:00 by
shhn1
Older