Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
vllm-project/vllm
Pull Requests
Commits
Open
Closed
[Misc][Quantization] Clarify the intent of GGUF `FusedMoE` weight materialization
ready
#30310 opened 2025-12-09 06:40 by
a4lg
Fix incomplete response generation for tool call outputs
frontend
deepseek
fb-exported
meta-exported
#30304 opened 2025-12-09 05:01 by
qandrew
[Misc] Pass reasoning to deepseekV32 tokenizer
frontend
deepseek
#30302 opened 2025-12-09 04:48 by
kingsmad
[ResponsesAPI] Add GPTOSS MCP tool streaming
frontend
gpt-oss
#30301 opened 2025-12-09 04:36 by
qandrew
[Bugfix] Update WSL detection to check for WSL1 compatibility as WSL2…
#30299 opened 2025-12-09 03:59 by
HoneyBerries
[CI/Build][Kernel][BugFix][AMD] Fix per_token_group_quant_fp8 to use correct fp8 min/max values and update atol/rtol in test_quantfp8_group_functionality
rocm
#30292 opened 2025-12-09 03:07 by
rasmith
[CI/Build][AMD] Fix ref_dynamic_per_token_quant reference implementation on ROCm.
rocm
#30291 opened 2025-12-09 02:53 by
rasmith
[Core] Add token-level KV cache metrics to V1 engine
v1
#30289 opened 2025-12-09 02:19 by
Minsung-commit
Adding quantized fused_moe_lora support
#30286 opened 2025-12-09 01:22 by
yugong333
Ensure minimum frames for GLM 4.6V compatibility
ready
#30285 opened 2025-12-09 00:23 by
gh-wf
[BugFix] Lazy tokenizer init in StructuredOutputManager to prevent GGUF semaphore leak
structured-output
v1
#30284 opened 2025-12-09 00:00 by
kitaekatt
[Small] Add comment for `parallel_config` in `FusedMoEModularKernel`
#30282 opened 2025-12-08 22:32 by
yewentao256
[CI/Build] Ignore data_parallel_size_local
#30281 opened 2025-12-08 22:05 by
rjrock
Add `moe_align_block_size_no_permute` for small batch size with large num_expert
needs-rebase
#30280 opened 2025-12-08 21:49 by
RunkaiTao
[CPU][Bugfix] Fix CPU Profiler issue
needs-rebase
v1
#30278 opened 2025-12-08 21:31 by
zhili03
[ROCM][CI] Fix AMD Examples Test Group
documentation
rocm
ready
ci/build
#30276 opened 2025-12-08 19:35 by
Concurrensee
[NIXL] refine decoder side post process for heterogeneous BlockSize and kv_layout
v1
kv-connector
#30275 opened 2025-12-08 19:34 by
xuechendi
[AMD] Amd/deepseek aiter fusions
rocm
needs-rebase
v1
deepseek
#30274 opened 2025-12-08 19:33 by
k50112113
[Bugfix] Temporarily disable group quant rms norm fusion
#30273 opened 2025-12-08 18:34 by
ElizaWszola
[CI/Build] Use spawn subprocess for ROCm
documentation
rocm
#30272 opened 2025-12-08 18:16 by
rjrock
[ROCm][CI][Bugfix] Multi-Modal Model Support Fixes and Attention Backend Improvements
rocm
needs-rebase
ci/build
multi-modality
qwen
#30270 opened 2025-12-08 17:00 by
AndreasKaratzas
[Frontend] Fixes anthropic streaming message_start usage nesting
frontend
ready
#30266 opened 2025-12-08 15:58 by
bbartels
Fix 500 /tokenize errors and 400 v1/chat/completions errors when using truncate_prompt_tokens and sending /tokenize and v1/chat/completions requests under high concurrency
frontend
#30264 opened 2025-12-08 14:41 by
Soufiane-Ra
Multiple Hybrid KV Cache Coordinator
v1
#30263 opened 2025-12-08 14:09 by
roikoren755
Support TP which is not divded for NVFP4 kernels (flashinfer-cutlass) by adding dynamic padding
nvidia
#30260 opened 2025-12-08 13:25 by
danielafrimi
[Feature]: OpenTelemetry Metrics Support
v1
#30258 opened 2025-12-08 11:45 by
mladjan-gadzic
[ROCm] Use aiter.topk_sigmoid in llama4
rocm
needs-rebase
llama
#30255 opened 2025-12-08 11:07 by
tpopp
gptq marlin quantization support for fused moe with lora
#30254 opened 2025-12-08 10:33 by
Bhanu068
[gpt-oss] Add model_identity to system message retrieval for harmony chat template
frontend
gpt-oss
#30247 opened 2025-12-08 08:43 by
lyuwen
[Bugfix] Fix fusion for VL models
#30244 opened 2025-12-08 07:47 by
ElizaWszola
Newer
Older