Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
vllm-project/vllm
Pull Requests
Commits
lease-refresh
0.17.0take2
7snzwi-codex/change-default-logging-behavior
acc-rate
add-cuda-12.8-wheel
amd_dev
amd_mori
amd-ci
andy-neuma-ibm-smoke
batched_triton_fallback
bench-latency
benchmark_serving_test
bind_kv_caches
build-flashinfer-aot-wheel
builder-cuda-version
builder-nvcc-toolchain
bump_numba
ci/macos-arm-wheel
claude/nervous-meitner
claude/optimize-weight-loading-7FlLd
claude/refactor-cmake-includes-XK2Xl
claude/review-vllm-quantization-rfc-cGHDF
codex/add-auto-max-model-length-setting
codex/add-pandas-and-datasets-to-requirements
codex/change-default-logging-behavior
codex/remove-raydistributedexecutor-from-v0-engine
codex/remove-vllm-v0-engine-references-from-docs
codex/update-arch-overview-md-with-vllm-v1-details
compile-only-pr1
consolidate-awq-into-awq-marlin
convert-deepseek-tests-to-b200
copilot/add-sp-min-token-to-e2e-tests
copilot/fix-31e676e9-a4af-4ed2-b74d-19d27f0a57b2
copilot/fix-584be906-f283-4e17-8776-c14111357ee7
copilot/fix-56244f30-e76a-41ed-beaf-3bc9de22a2c9
copilot/fix-870996da-9146-438e-9a52-cdc6c1743086
copilot/fix-c6914add-1b66-46d0-9948-c2e7b6f2259f
copilot/fix-cudagraph-flag-combination
correct-docs-cuda-version
cuda-toolchain-override
cursor/VLLM-94-usage-stats-v2-design-584f
cursor/main-branch-failure-triage-f8d5
cursor/test-quality-improvements-eeea
dbo-cudagraph-size-cherry
deep_full_cudagraph_fix
deepep_tweaks
deepseek_optimizations_alex_rob
dependabot/github_actions/actions/checkout-6.0.2
dependabot/github_actions/actions/setup-python-6.2.0
dependabot/github_actions/actions/stale-10.2.0
disable-mxfp4a16-lm-eval
disable-sd
dockerfile-nvcc-compress
dockerignore_deps
downgrade-cuda-12.8
feat-k2.5-support
fix_ds_eagle
fix/encoder-cache-miss-mtp-38551
fix/eplb-balancedness-metric
fix/eplb-debug-logging
fix/eplb-nvfp4-modelopt
fix/eplb-prometheus-metrics
fix_fi_cutlass
fix_moe_test_flashinfer
fix_nixl_get_finished_handshake_failure
fix_nixl_triton_attn
fix/rmsnorm-gated-activation
fix_use_ep
fix-aiter-mixtral
fix-dg-warmup
fix-doc-build
fix-hashing-partial-blocks
fix-hybrid-kvcache-manager
fix-mtp
fix-mtp-dummy-run-assertion
fix-nonstream-reasoning
fix-precommit
fp8_ep_dp
full_cudagraph
gb200-0317
gemma3n-mm
gemma4-fast-prefill
ghsa-mcmc-2m55-j8jj
gptq-consolidation
gpu_ids2
gpu-ids
il_tool
integrate_aiter_batched_deepgemm
integrate-deepgemm-cmake
jax-tpu
kernel-block-size-alignment-ssm
khluu/automate-release-dockerhub-push
khluu/b200_k8s
khluu/build0405
khluu/cherrypick37322
khluu/disable_h200_x8
khluu/feb11
khluu/gemma2
khluu/glm5
khluu/group_commands
khluu/h200
khluu/mig
khluu/mig-small-model-swaps
khluu/releases/v0.16.0
khluu/rocm_gemma
khluu/test_ami
khluu/2/releases/v0.16.0
khluu-patch-1
lease-refresh
low_latency_opt
luka/fix-rms-quant-non-contiguous
luka/vllm-ir/compile-op
luka/vllm-ir/rms-norm
luka/vllm-ir/rms-norm-batch-invariant
luka/vllm-ir/rms-norm-inplace
lwilkinson/cg-support
lwilkinson/dbo-full-cudagraphs
lwilkinson/eagle-piecewise
lwilkinson/fix-glm-5-mtp-more-then-1
lwilkinson/potential-cutlass-mla-fix
lwilkinson/refactor-cmake
main
mamba_tests
marlin_gptoss_swiglu
maybe_fix_hang_2
mergify/houseroad/config-update
minus_x
mk-init-refactor-poc
mla_cuda_graphs
mla_decode_any_head
mla-support-awq-marlin
moondream2
move-gpt-oss-triton-moe-to-experts
mrv2-ci-test
multi-api-server-frontend
openai226
optimize-prefix-caching-scheduling
overlap-context-manager
overlap-workspace-fill-stream
pcp-alt
pd_scheduling
pil_image
prometheus-cudagraph-pct
qwen3_5_fp8
qwen25vl
rebased_fi_moe
reduce_scatter_comm
refactor-modelopt-fp8-modular-kernel
release
releases/v0.9.0
releases/v0.9.1
releases/v0.9.2
releases/v0.10.0
releases/v0.10.1
releases/v0.10.2
releases/v0.11.0
releases/v0.11.1
releases/v0.11.2
releases/v0.12.0
releases/v0.13.0
releases/v0.14.0
releases/v0.14.1
releases/v0.15.0
releases/v0.15.1
releases/v0.16.0
releases/v0.17.0
releases/v0.17.1
releases/v0.18.0
releases/v0.18.1
releases/v0.19.0
remove_mamba_ssm
remove_naive_all2all
remove-experts-int8
remove-fp4-moe-env-var-clean
remove-gptq-gemm
revert-21550-chengji/fix-ci
revert-22299-main
revert-25205-remote/serialize-inductor
revert-26740-wentao-optimize-startup-log-2
revert-27600-torch-utils-import
revert-29385-eplb_nightly_ci
revert-32344-moe-runner-0
revert-batch-kv-cache-swap-38460
robertgshaw2-redhat-patch-1
rocm_silu_mul_quant
running-deque
security-guide-cache-notes
seemethere/cuda_arm64
simon-mo-patch-1
skip-lmfe-tests
sm103
split_kv_cache_init
support_global_dp_logging
test-debug-lb
tms/distributed_timeout
tms/fix-nan
tms/nvfp4-nan-contamination-test
topk_id_hack
torch_dynamo
tpu_v1_optimized
tpu_v1
update_from_kv_xfer_finished_race_fix
use-uv-python-for-docker
v0.8.0
v0.8.1
v0.8.2
v0.8.3
v0.8.4
v0.8.5
v0.16.0-before210
v0.16.0-cu128
v0.16.0-torch291
v1-sched-interface-2
v1_fix_profiler
vadim/qwen35-no-deppgemm
verbose-prime-rl-ci
vllm-dashboard
wentao-dcp-support-for-v2
wentao-deprecate-cprofile
wentao-enable-flashinfer-moe-fp4-by-default
wentao-epd-support-for-MRv2
wentao-fix-amd-ci-test-others-bug
wentao-fix-ci-batch-invariant-issue
wentao-fix-dcp-IMA-for-v2
wentao-fix-python-install-ci-error
wentao-fix-qwen3vl-launch-bug
wentao-fix-qwen3.5-batch-invariant
wentao-fix-torch-compile-issue
wentao-optimize-async-scheduling-copy
wentao-optimize-dcp-and-add-comm-func
wentao-optimize-model-runner-v2-prepare_inputs
wentao-optimize-model-runner-v2-sampler
wentao-optimize-redundant-device-sync
wentao-optimize-sampled-token-ids
wentao-prefer-sysmem-comm
wentao-skip-work-when-empty
wentao-sp-support-for-v2
wentao-update-torch-to-2.9.1
whisper-translate
wide_ep_working_branch
wide_ep_working_branch_2
wna16-modular-kernel
woosuk/ds-exp
woosuk/ds-exp-moe
woosuk/ds-exp-tmp
woosuk/flashinfer-swa
woosuk/mrv2-cudagraph-attn-fix
woosuk/mrv2-cudagraph-rework
woosuk/mrv2-expert-indices
woosuk/mrv2-pp-full-cudagraph
woosuk/mrv2-slot-map-minor
woosuk/mrv2-whisper
woosuk/remove-req-idx-mapping
woosuk/rm-add-init-env
woosuk/rm-logits-process
woosuk/router-nixl
woosuk/test-router
woosuk/whisper-blackwell
woosuk-jf
wye-refactor-w8a8-quant
zhuohan/moe-kernel-experiment
zhuohan/redundant-pooling-check
zhuohan/remove-redundant-argument
zhuohan/remove-unnecessary-instance_id-setup
zhuohan/remove-virtual-engine
zhuohan/revert-26709
revert spurious change
Robert Shaw
committed
34 days ago
275da3ca
refactor a bit
Robert Shaw
committed
34 days ago
a250ae33
update from nixl to internal
Robert Shaw
committed
34 days ago
4b554d19
humans are still needed to write code
Robert Shaw
committed
34 days ago
934224a2
[Feat][NIXL] Add KV lease refresh mechanism for disaggregated prefill
Robert Shaw
committed
35 days ago
38c00afb
[Docs] Add breadcrumbs for better UX (#35749)
hmellor
committed
35 days ago
Verified
7e9149d9
[MyPy][BugFix] Check profiler is assigned before calling start() on it (#35505)
hickeyma
committed
35 days ago
Verified
87c98b02
Fix unresolved-import errors when using Astral's ty by removing src.root (#35681)
tlrmchlsmth
committed
35 days ago
Verified
de7dd634
[Feat] Supports Anthropic Messages count_tokens API (#35588)
chaunceyjiang
committed
35 days ago
Verified
9a87b057
[Misc] Cleanup useless `current_platform` import (#35715)
wangxiyuan
committed
35 days ago
Verified
510bc9e1
[CPU][Distributed] Fix Enable _CPUSHMDistributed only when TP/PP ranks share the same SHM group name (#34169)
charlesashby
committed
35 days ago
Verified
cbd361fd
[Misc] Bound NIXL upper bound version (#35495)
NickLucche
committed
35 days ago
Verified
c212202d
[CI] Defining extended V1 e2e + engine tests (#35580)
AndreasKaratzas
committed
35 days ago
Verified
ec27b36b
[Rocm][CI] Fix LM Eval Large Models (H100) test group (#34750)
charlifu
committed
35 days ago
Verified
3fd1d4ec
[Kernel] Integrate SM100 MXFP8 blockscaled grouped MM and quant kernels (#34448)
EdalatiAli
committed
35 days ago
Verified
cb21972a
[ROCm][CI] Disable skinny GEMMs in language model standard tests to fix non-determinism (#35152)
AndreasKaratzas
committed
35 days ago
Verified
c34963f1
[ROCm] add amd-quark package in requirements for rocm to use quantized models (#35658)
hongxiayang
committed
35 days ago
Verified
f26650d6
[XPU] fix mxfp4 activation type (#35691)
jikunshang
committed
35 days ago
Verified
92f5d0f0
Fix deprecated v1 config tests (#35327)
jcaip
committed
35 days ago
Verified
a60985b0
[Attention] FA4 integration (#32974)
LucasWilkinson
committed
36 days ago
Verified
8b5014d3
Revert "[Bugfix] Disable TRTLLM attention with KV transfer enabled (#33192)" (#34832)
ZhanqiuHu
committed
36 days ago
Verified
57a96e26
[torch.compile] Undo the fast_moe_cold_start hack in torch>=2.11 (#35475)
zou3519
committed
36 days ago
Verified
e82fbeec
[Bugfix] Fix dtype mismatch in RMSNormGated.forward_native() during torch.compile (#35256)
haosdent
committed
36 days ago
Verified
62904708
[Model Runner V2] Use block table apis for capture inputs (#35671)
WoosukKwon
committed
36 days ago
Verified
72f4d162
fix(mxfp4): return is_monolithic=False when LoRA is enabled for Triton backend (#35382)
yoonsnowdev
committed
36 days ago
Verified
5a435507
[MISC] Fixing a null reference by removing parallel_utils from mypy EXCLUDE (#35630)
taneem-ibrahim
committed
36 days ago
Verified
59d7af9c
[Mamba1] - Kernel Level Chunk Alignment for Prefix Caching (#34798)
Josephasafg
committed
36 days ago
Verified
bbf81f9a
[Model Runner V2] Minor refactoring for EncoderRunner (#35628)
WoosukKwon
committed
36 days ago
Verified
da543d1a
[AMD][CI] Support Triton attention with ExampleConnector (#34931)
rjrock
committed
36 days ago
Verified
87d319c5
Fix typo: implictly -> implicitly in isaac.py docstring (#35646)
lin-shh
committed
36 days ago
Verified
a9ec392c
Older