Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
vllm-project/vllm
Pull Requests
Commits
remove-petit-nvfp4
7snzwi-codex/change-default-logging-behavior
acc-rate
amd_dev
amd_mori
amd-ci
andy-neuma-ibm-smoke
batched_triton_fallback
bench-latency
benchmark_serving_test
bind_kv_caches
build-flashinfer-aot-wheel
bump_numba
codex/add-auto-max-model-length-setting
codex/add-pandas-and-datasets-to-requirements
codex/change-default-logging-behavior
codex/remove-raydistributedexecutor-from-v0-engine
codex/remove-virtual-engine-from-codebase
codex/remove-vllm-v0-engine-references-from-docs
codex/update-arch-overview-md-with-vllm-v1-details
copilot/fix-31e676e9-a4af-4ed2-b74d-19d27f0a57b2
copilot/fix-584be906-f283-4e17-8776-c14111357ee7
copilot/fix-56244f30-e76a-41ed-beaf-3bc9de22a2c9
copilot/fix-870996da-9146-438e-9a52-cdc6c1743086
copilot/fix-c6914add-1b66-46d0-9948-c2e7b6f2259f
copilot/fix-cudagraph-flag-combination
correct-docs-cuda-version
dbo-cudagraph-size-cherry
deep_full_cudagraph_fix
deepep_tweaks
deepseek_optimizations_alex_rob
dependabot/github_actions/actions/checkout-5.0.0
dependabot/github_actions/actions/checkout-6.0.2
dependabot/github_actions/actions/setup-python-6.2.0
deprecate-pplx-kernels
disable-sd
dockerfile-nvcc-compress
feat-k2.5-support
fix_ds_eagle
fix_use_ep
fix-aiter-mixtral
fix-dg-warmup
fix-doc-build
fix-hashing-partial-blocks
fix-mtp
fix-precommit
fp8_ep_dp
full_cudagraph
gemma3n-mm
ghsa-mcmc-2m55-j8jj
gpt-oss-eval-h100
gpu_ids2
gpu-ids
il_tool
integrate_aiter_batched_deepgemm
jax-tpu
khluu/disable_h200_x8
khluu/feb11
khluu/glm5
khluu/h200
khluu/test_ami
khluu-patch-1
low_latency_opt
lwilkinson/cg-support
lwilkinson/dbo-full-cudagraphs
lwilkinson/eagle-piecewise
lwilkinson/potential-cutlass-mla-fix
lwilkinson/refactor-cmake
main
mamba_tests
marlin_gptoss_swiglu
maybe_fix_hang_2
mergify/houseroad/config-update
minus_x
mk-init-refactor-poc
mla_cuda_graphs
mla_decode_any_head
mla-support-awq-marlin
moondream2
optimize-prefix-caching-scheduling
overlap-context-manager
overlap-workspace-fill-stream
pcp-alt
pd_scheduling
pil_image
qwen25vl
rebased_fi_moe
reduce_scatter_comm
refactor-modelopt-fp8-modular-kernel
releases/v0.9.0
releases/v0.9.1
releases/v0.9.2
releases/v0.10.0
releases/v0.10.1
releases/v0.10.2
releases/v0.11.0
releases/v0.11.1
releases/v0.11.2
releases/v0.12.0
releases/v0.13.0
releases/v0.14.0
releases/v0.14.1
releases/v0.15.0
releases/v0.15.1
releases/v0.16.0
remove_mamba_ssm
remove_naive_all2all
remove-experts-int8
remove-petit-nvfp4
remove-ptpc-fp8
revert-21550-chengji/fix-ci
revert-22299-main
revert-25205-remote/serialize-inductor
revert-26740-wentao-optimize-startup-log-2
revert-27600-torch-utils-import
revert-29385-eplb_nightly_ci
revert-32344-moe-runner-0
rocm_silu_mul_quant
running-deque
seemethere/cuda_arm64
simon-mo-patch-1
skip-lmfe-tests
split_kv_cache_init
support_global_dp_logging
test-debug-lb
tms/distributed_timeout
topk_id_hack
torch_dynamo
tpu_v1_optimized
tpu_v1
update_from_kv_xfer_finished_race_fix
use-sgl-gate-for-fp32-router-logits
use-uv-python-for-docker
v0.8.0
v0.8.1
v0.8.2
v0.8.3
v0.8.4
v0.8.5
v1-sched-interface-2
v1_fix_profiler
verbose-prime-rl-ci
vllm-dashboard
wentao-dcp-support-for-v2
wentao-deprecate-head_first
wentao-enable-flashinfer-moe-fp4-by-default
wentao-fix-python-install-ci-error
wentao-fix-qwen3vl-launch-bug
wentao-fix-torch-compile-issue
wentao-optimize-async-scheduling-copy
wentao-optimize-reasoning-streaming
wentao-pp-async-send/recv
wentao-prefer-sysmem-comm
wentao-update-torch-to-2.9.1
whisper-translate
wide_ep_working_branch
wide_ep_working_branch_2
wna16-modular-kernel
woosuk/flashinfer-swa
woosuk/remove-req-idx-mapping
woosuk/rm-add-init-env
woosuk/router-nixl
woosuk/sampled-token-ids
woosuk/test-router
woosuk-jf
wye-refactor-w8a8-quant
zhuohan/moe-kernel-experiment
zhuohan/remove-redundant-argument
zhuohan/remove-virtual-engine
zhuohan/revert-26709
zhuohan/sampler-comment-fix
remove
Robert Shaw
committed
23 days ago
177d973b
[XPU]Support AgRsAll2AllManager on XPU device (#32654)
ys950902
committed
23 days ago
Verified
13f6630a
[4/N] Initialize MM components in context managers (M-P) (#32663)
DarkLight1337
committed
23 days ago
Verified
fda3f03e
[Metrics] Complete removal of deprecated vllm:time_per_output_token_seconds metric (#32661)
carlory
committed
23 days ago
Verified
bb917203
[Bugfix] Fix the fp8_mqa_logits dim mismatch (#32652)
chaunceyjiang
committed
23 days ago
Verified
c4e5bdf6
[3/N] Initialize MM components in context managers (I-L) (#32650)
DarkLight1337
committed
23 days ago
Verified
7f1bcd18
[Core] Cleanup shm based object store on engine shutdown (#32429)
walterbm
committed
23 days ago
Verified
8be263c3
[2/N] Initialize MM components in context managers (E-H) (#32641)
DarkLight1337
committed
23 days ago
Verified
e1a34c3a
[Refactor] Make FP8 Linear Ops use kernel abstraction (#27814)
vllmellm
committed
23 days ago
Verified
148117ea
[Model Runner V2] Skip kernel launch for penalties & logit_bias (#32634)
WoosukKwon
committed
23 days ago
Verified
e9c83cdc
[1/N] Initialize MM components in context managers (A-D) (#32632)
DarkLight1337
committed
23 days ago
Verified
b75e85de
[Model] Use context managers for encoder- and LM-only mode (#32605)
DarkLight1337
committed
23 days ago
Verified
4753f3bf
[Model Runner V2] Decouple temperature from penalties (#32629)
WoosukKwon
committed
23 days ago
Verified
6c01ffb8
[Model Runner V2] Refactor get_cudagraph_and_dp_padding (#32625)
WoosukKwon
committed
23 days ago
Verified
7b7cdce9
[Feat] allow inplace loading lora (#31326)
Jackmin801
committed
23 days ago
Verified
12dab78f
[Model Runner V2] Initialized communication buffer for DP (#32624)
WoosukKwon
committed
23 days ago
Verified
05dc4bfa
[Attention][MLA] Make FLASHINFER_MLA the default MLA backend on Blackwell, and TRTLLM the default prefill (#32615)
MatthewBonanni
committed
23 days ago
Verified
1a1fc3bb
[Model Runner V2] Refactor `dummy_run` (#32533)
WoosukKwon
committed
23 days ago
Verified
43fada53
feat: spec decode with draft models (#24322)
tomasruizt
committed
23 days ago
Verified
4a5299c9
docs: prefix caching seems quite outdated (#28784)
longregen
committed
23 days ago
Verified
73f2a81c
[BugFix] Fix TRT-LLM NVFP4 DP/EP (#32349)
jiahanc
committed
23 days ago
Verified
73503317
[CI] Add Helion as an optional dependency (#32482)
gmagogsfm
committed
24 days ago
Verified
9d1e611f
[BUGFIX] Fix `test_mla_backends.py`. Scale MLA projection weights to prevent numerical instability (#32529)
vadiklyutiy
committed
24 days ago
Verified
0727cc9e
[CI][amd] Revert NIXL connector change to avoid crash (#32570)
qli88
committed
24 days ago
Verified
a0490be8
support dynamic resolution image encoding for Nemotron Nano VL (#32121)
netanel-haber
committed
24 days ago
Verified
cd3ac5b7
[Misc] Remove unused ModelKeys (#32608)
jeejeelee
committed
24 days ago
Verified
2636d762
Add support for LoRA adapters in Nemotron-H models (#30802)
danisereb
committed
24 days ago
Verified
aa7f37cc
[Frontend] Score entrypoint support data_1 & data_2 and queries & documents as inputs (#32577)
noooop
committed
24 days ago
Verified
c88860d7
[NIXL][Metrics] Track `nixl_num_kv_expired_reqs` metric in Prometheus (#32340)
NickLucche
committed
24 days ago
Verified
758df5af
[CI/Build] Fix dependency conflict between model-hosting-container-standards and starlette (#32560)
DanielMe
committed
24 days ago
Verified
cdd03d25
Older