Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
vllm-project/vllm
Pull Requests
Commits
Open
Closed
implement issue #20711
#20879 opened 2025-07-13 11:51 by
kiscad
[Refactor][V1] Move outlines utils for V1 imports
structured-output
ready
v1
#20878 opened 2025-07-13 11:51 by
aarnphm
[Benchmark] Add expert parallel support to MoE benchmark
performance
#20876 opened 2025-07-13 09:12 by
Chen-zexi
[Frontend] OpenAI Responses API supports Tool/Function calling
frontend
v1
tool-calling
llama
deepseek
#20874 opened 2025-07-13 08:00 by
chaunceyjiang
[Bugfix] Fix: Fix multi loras with tp >=2 and LRU cache
#20873 opened 2025-07-13 07:38 by
charent
[Feature][EPLB] Add EPLB support for OLMoE
#20872 opened 2025-07-13 06:59 by
ztang2370
[EPLB] Add EPLB support for dots1
#20870 opened 2025-07-13 06:32 by
wenchen76
Allow serving Llama4ForCausalLM directly
new-model
llama
#20868 opened 2025-07-13 05:09 by
sarckk
nit
llama
qwen
#20864 opened 2025-07-12 22:30 by
py-andy-c
Fused moe tuning ep
performance
#20863 opened 2025-07-12 22:04 by
robertgshaw2-redhat
[Feature] limit thinking tokens
frontend
v1
deepseek
#20859 opened 2025-07-12 09:22 by
llsj14
[Misc] Relax translations tests
#20856 opened 2025-07-12 07:49 by
NickLucche
[Nvidia] Integrate cudnn prefill paged attention kernel for head_dim == 128 models, like Llama family
needs-rebase
v1
llama
#20850 opened 2025-07-12 00:19 by
elfiegg
[WIP] Enable xpu sleep mode
v1
#20848 opened 2025-07-11 23:09 by
yangw1234
[Docs] Update supported models documentation with missing models
documentation
#20844 opened 2025-07-11 22:07 by
luccafong
[V1] [Hybrid] Refactor mamba state shape calculation; enable V1 via cli
v1
#20840 opened 2025-07-11 20:04 by
tdoublep
[Model] Replace Mamba2 RMSNorm Gated with Fused Triton Kernel
#20839 opened 2025-07-11 19:38 by
cyang49
[BugFix] fix two issues: using metadata for causal-conv1d and init_states in v0
#20838 opened 2025-07-11 19:30 by
thoangtrvn
[Frontend] Add chunked processing to handle long inputs in embedding models
documentation
frontend
#20837 opened 2025-07-11 18:58 by
x22x22
[compile][startup] Disable C++ compilation of symbolic shapes
#20836 opened 2025-07-11 18:49 by
anijain2305
Add DeepSeek V2/V3 model family to PP tests
deepseek
#20831 opened 2025-07-11 16:59 by
eicherseiji
Switching attention backend to correctly complete LoRA and Quantized Model Tests
rocm
ci/build
#20829 opened 2025-07-11 16:55 by
Alexei-V-Ivanov-AMD
[Bugfix] Fix Qwen2 audio chat template for old version transformers compatibility
frontend
qwen
#20826 opened 2025-07-11 16:05 by
Isotr0py
[Bugfix] Fix the bug in Hermes streaming parsing
frontend
ci/build
tool-calling
#20824 opened 2025-07-11 15:39 by
pre-master
[Model] Add ToolParser for Hunyuan A13B.
documentation
performance
frontend
tool-calling
#20820 opened 2025-07-11 14:05 by
kzjeef
[Feature][EPLB] Add eplb support for Qwen3
qwen
#20815 opened 2025-07-11 12:11 by
aladerran
[v1][core]Support for attention free models
v1
#20811 opened 2025-07-11 10:55 by
christian-pinto
[Model] Add support for Jina Embeddings V4
documentation
performance
new-model
multi-modality
#20802 opened 2025-07-11 07:01 by
sigridjineth
[FIX] bump mistral common to support devstral 2507
ci/build
#20801 opened 2025-07-11 06:28 by
ptS10011
[Feature] Add command tool parser for Command-A model
frontend
tool-calling
#20800 opened 2025-07-11 06:19 by
gjgjos
Older