Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
vllm-project/vllm
Pull Requests
Commits
Open
Closed
[torch.compile] Use FakeTensors instead of real GPU tensors for single-size compilation
#36093 opened 2026-03-05 04:58 by
zou3519
[ROCm] Fix AITER ops fake impl and minor bugs
rocm
#36092 opened 2026-03-05 04:57 by
ChuanLi1101
[ROCm][CI] Making some tests optional to reduce workload
rocm
ci/build
#36090 opened 2026-03-05 04:46 by
AndreasKaratzas
[Bugfix] Handle TimeoutError in Voxtral buffer_realtime_audio to prevent silent hang
bug
#36089 opened 2026-03-05 04:32 by
OiPunk
Don't fire ray compatibility webhook when PR or branch is not provided
ci/build
#36088 opened 2026-03-05 04:22 by
jeffreywang-anyscale
[AMD][Build] Add DeepEP to ROCm Dockerfile
rocm
ci/build
#36086 opened 2026-03-05 04:15 by
rjrock
[Hardware] Replace `torch.cuda.synchronize()` api with `torch.accelerator.synchronize`
documentation
performance
ready
v1
nvidia
ready-run-all-tests
#36085 opened 2026-03-05 03:57 by
jikunshang
Add adaptive decode chunking for SM100 fused TRTLLM path (TMP FIX)#34988
v1
nvidia
#36083 opened 2026-03-05 03:46 by
baonudesifeizhai
Perf: Optimize DeepEP prepare/finalize for identity mapping
#36081 opened 2026-03-05 03:11 by
xueliangyang-oeuler
[PluggableLayer][4/N] Apply PluggableLayer to remaining layers
#36080 opened 2026-03-05 03:09 by
whx-sjtu
Yejin/bench sleep wake timeout
performance
frontend
tpu
needs-rebase
v1
cpu
#36079 opened 2026-03-05 02:59 by
YJYJLee
Enable ModelRunnerV2 on XPU
v1
#36078 opened 2026-03-05 02:11 by
xinyu-intel
Revert "[Hardware] Replace `torch.cuda.empty_cache` with `torch.accelerator.empty_cache`" (#30681)
documentation
performance
structured-output
v1
nvidia
#36076 opened 2026-03-05 02:05 by
zhewenl
[Docs] Add doc note about building for free-threaded Python.
documentation
#36074 opened 2026-03-05 01:45 by
nascheme
[WIP][Proof of concept] Overlap model loading and torch.compile
frontend
v1
#36072 opened 2026-03-05 01:35 by
zou3519
[Bugfix][DCP] Fix CUDA graph capture for Decode Context Parallelism
bug
v1
nvidia
#36070 opened 2026-03-05 01:01 by
sungsooha
fix(lora): bounds-check lora_a/lora_b in MergedColumnParallelLinear.set_lora
#36069 opened 2026-03-05 00:43 by
JackYoung27
[Bugfix] Allow inherited_fds to be None to fix warnings when using spawn
bug
v1
#36068 opened 2026-03-05 00:30 by
tjohnson31415
set VLLM_USE_BYTECODE_HOOK to 0 by default
#36067 opened 2026-03-05 00:26 by
laithsakka
test Qwen/Qwen3-4B-Instruct-2507 for unbacked
qwen
#36064 opened 2026-03-05 00:00 by
laithsakka
[Refactor] Consolidate SupportsEagle
v1
llama
qwen
gpt-oss
kv-connector
#36063 opened 2026-03-04 23:54 by
benchislett
[Kernel] [Helion] [11/N] Retune configs for silu_mul_fp8
ready
#36062 opened 2026-03-04 23:51 by
gmagogsfm
[Bugfix] Fix DP/EP Shared Expert With Monolithic Kernels
bug
#36061 opened 2026-03-04 23:51 by
robertgshaw2-redhat
fix: force prefill path for MTP drafting on SM121 (GB10 Spark)
v1
nvidia
#36060 opened 2026-03-04 23:34 by
scottgl9
[BugFix] Fallback from FA4->FA2 for Batch Invariance
bug
v1
#36059 opened 2026-03-04 23:08 by
frankwang28
[2/n] Migrate per_token_group_quant to torch stable ABI
ci/build
nvidia
#36058 opened 2026-03-04 23:06 by
mikaylagawarecki
Adding deterministic lora benchmarking to vLLM Bench
performance
#36057 opened 2026-03-04 22:58 by
RonaldBXu
[Bugfix] Fix Deepseekv32 tool parser when stream interval > 1
bug
deepseek
#36056 opened 2026-03-04 22:48 by
sfeng33
[Bugfix] Fix zombie EngineCore processes after parent exit
bug
v1
#36055 opened 2026-03-04 22:46 by
AjAnubolu
[Bugfix] Fix tokenize endpoint malformed token_strs
bug
frontend
#36054 opened 2026-03-04 22:41 by
AjAnubolu
Older