Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
vllm-project/vllm
Pull Requests
Commits
remove_mamba_ssm
0.17.0take2
7snzwi-codex/change-default-logging-behavior
acc-rate
add-cuda-12.8-wheel
amd_dev
amd_mori
amd-ci
andy-neuma-ibm-smoke
batched_triton_fallback
bench-latency
benchmark_serving_test
bind_kv_caches
bugfix/37931-nvfp4-batched-all2all
build-flashinfer-aot-wheel
builder-cuda-version
builder-nvcc-toolchain
bump_numba
chinmay-amd-snapshot
ci/macos-arm-wheel
ci/narrow-engine-deps
ci/narrow-misc-deps
ci/narrow-platform-tests-cuda-deps
ci/narrow-pytorch-compile-deps
ci/reorder-release-pipeline
claude/nervous-meitner
claude/optimize-weight-loading-7FlLd
claude/refactor-cmake-includes-XK2Xl
claude/review-vllm-quantization-rfc-cGHDF
claude/slack-session-JTjDk
claude/zen-banach
codex/add-auto-max-model-length-setting
codex/add-pandas-and-datasets-to-requirements
codex/change-default-logging-behavior
codex/remove-raydistributedexecutor-from-v0-engine
codex/remove-vllm-v0-engine-references-from-docs
codex/update-arch-overview-md-with-vllm-v1-details
codex/37931-flashinfer-cutedsl-batched-one-sided
compile-only-pr1
consolidate-awq-into-awq-marlin
convert-deepseek-tests-to-b200
copilot/add-sp-min-token-to-e2e-tests
copilot/fix-31e676e9-a4af-4ed2-b74d-19d27f0a57b2
copilot/fix-584be906-f283-4e17-8776-c14111357ee7
copilot/fix-56244f30-e76a-41ed-beaf-3bc9de22a2c9
copilot/fix-870996da-9146-438e-9a52-cdc6c1743086
copilot/fix-c6914add-1b66-46d0-9948-c2e7b6f2259f
copilot/fix-cudagraph-flag-combination
copilot/update-test-conftest-to-use-moe-backend
correct-docs-cuda-version
cu129-manylinux-build
cuda-toolchain-override
cursor/VLLM-94-usage-stats-v2-design-584f
cursor/main-branch-failure-triage-f8d5
cursor/test-quality-improvements-eeea
cutlass_fa3_mla_sparse
dbo-cudagraph-size-cherry
deep_full_cudagraph_fix
deepep_tweaks
deepep-v2-integration
deepseek_optimizations_alex_rob
dependabot/github_actions/actions/checkout-6.0.2
dependabot/github_actions/actions/github-script-9.0.0
dependabot/github_actions/actions/setup-python-6.2.0
dependabot/github_actions/actions/stale-10.2.0
dependabot/pip/fsspec-2026.4.0
dependabot/pip/minor-update-fd8a93e18e
dependabot/pip/protobuf-7.34.1
dependabot/pip/pyrate-limiter-4.1.0
dependabot/pip/quack-kernels-gte-0.4.1
deprecate-timeout
disable-image-build-per-commit
disable-mxfp4a16-lm-eval
disable-sd
dockerfile-nvcc-compress
dockerignore_deps
downgrade-cuda-12.8
dsv4-pd-fixes
feat-k2.5-support
fix_ds_eagle
fix/encoder-cache-miss-mtp-38551
fix/eplb-balancedness-metric
fix/eplb-debug-logging
fix/eplb-nvfp4-modelopt
fix/eplb-prometheus-metrics
fix_fi_cutlass
fix/flashinfer-nvfp4-cross-row-scale-corruption
fix/has-module-trial-import
fix_moe_test_flashinfer
fix_nixl_get_finished_handshake_failure
fix_nixl_triton_attn
fix/nvfp4-batched-backend-env-upgrade
fix/rmsnorm-gated-activation
fix/skip-gpu-memory-util-when-kv-cache-bytes-set
fix_use_ep
fix-aiter-mixtral
fix-dg-warmup
fix-doc-build
fix-eplb-vlm
fix-hashing-partial-blocks
fix-hybrid-kvcache-manager
fix-mtp
fix-mtp-dummy-run-assertion
fix-nixl-dockerfile
fix-nonstream-reasoning
fix-precommit
fp8_ep_dp
full_cudagraph
fused-norm-gate-dsv4
ganyi/dsv3.2_rocm_support
gb200-0317
gemma3n-mm
gemma4-fast-prefill
gemma4-mtp
ghsa-mcmc-2m55-j8jj
glm5-router
gptq-consolidation
gpu_ids2
gpu-ids
il_tool
indexer_multistream
integrate_aiter_batched_deepgemm
jax-tpu
kernel-block-size-alignment-ssm
khluu/automate-release-dockerhub-push
khluu/b200-k8s-ci-smoke-20260429
khluu/b200-k8s-job-fixes
khluu/b200_k8s
khluu/build0405
khluu/cherrypick37322
khluu/disable_h200_x8
khluu/feb11
khluu/gemma2
khluu/gemma3
khluu/glm5
khluu/group_commands
khluu/h200
khluu/mig
khluu/mig-small-model-swaps
khluu/release-registry-cache
khluu/release-v0.20.1-uv-python
khluu/releases/v0.16.0
khluu/rocm_gemma
khluu/test_ami
khluu/trigger-perf-eval-nightly
khluu/vllm-base-uv-python
khluu/2/releases/v0.16.0
khluu/0190-540
khluu-patch-1
lease-refresh
lora-test
low_latency_opt
luka/fix-rms-quant-non-contiguous
luka/vllm-ir/compile-op
luka/vllm-ir/rms-norm-batch-invariant
luka/vllm-ir/triton
luka/vllm-ir-nits
lwilkinson/cg-support
lwilkinson/dbo-full-cudagraphs
lwilkinson/eagle-piecewise
lwilkinson/fix-glm-5-mtp-more-then-1
lwilkinson/potential-cutlass-mla-fix
lwilkinson/refactor-cmake
main
mamba_tests
marlin_gptoss_swiglu
maybe_fix_hang_2
mergify/houseroad/config-update
mgoin/dgx-spark-smoke-test
mgoin/linear-backend
minus_x
mk-init-refactor-poc
mla_cuda_graphs
mla_decode_any_head
mla-support-awq-marlin
moe-lora-ep
moondream2
move-gpt-oss-triton-moe-to-experts
mrv2-ci-test
multi-api-server-frontend
openai226
optimize-prefix-caching-scheduling
overlap-context-manager
overlap-workspace-fill-stream
pcp-alt
pd_scheduling
pil_image
prometheus-cudagraph-pct
qwen3_5_fp8
qwen25vl
rebase-fa3-mla-sparse
rebased_fi_moe
redhat-h100-testing
reduce_scatter_comm
refactor-modelopt-fp8-modular-kernel
release
releases/v0.9.0
releases/v0.9.1
releases/v0.9.2
releases/v0.10.0
releases/v0.10.1
releases/v0.10.2
releases/v0.11.0
releases/v0.11.1
releases/v0.11.2
releases/v0.12.0
releases/v0.13.0
releases/v0.14.0
releases/v0.14.1
releases/v0.15.0
releases/v0.15.1
releases/v0.16.0
releases/v0.17.0
releases/v0.17.1
releases/v0.18.0
releases/v0.18.1
releases/v0.19.0
releases/v0.19.1
releases/v0.20.0
releases/v0.20.1-python-from-source
releases/v0.20.1
releases/v0.20.2
remove_mamba_ssm
remove-experts-int8
remove-fp4-moe-env-var-clean
remove-gptq-gemm
remove-outdated-unit-tests
revert-21550-chengji/fix-ci
revert-22299-main
revert-25205-remote/serialize-inductor
revert-26740-wentao-optimize-startup-log-2
revert-27600-torch-utils-import
revert-29385-eplb_nightly_ci
revert-32344-moe-runner-0
revert-batch-kv-cache-swap-38460
robertgshaw2-redhat-patch-1
rocm_silu_mul_quant
running-deque
seemethere/cuda_arm64
simon-mo-patch-1
skip-lmfe-tests
sm103
split_kv_cache_init
support_global_dp_logging
test-debug-lb
tms/distributed_timeout
tms/fix-nan
tms/nvfp4-nan-contamination-test
tokenspeed
topk_id_hack
torch_dynamo
tpu_v1_optimized
tpu_v1
update_from_kv_xfer_finished_race_fix
upgrade-transformers-compressed-tensors
use-uv-python-for-docker
v0.8.0
v0.8.1
v0.8.2
v0.8.3
v0.8.4
v0.8.5
v0.16.0-before210
v0.16.0-cu128
v0.16.0-torch291
v1_fix_profiler
vadim/qwen35-no-deppgemm
verbose-prime-rl-ci
vllm-dashboard
wentao-add-fast-all2all-kernel
wentao-cache-is_sleep
wentao-cleanup-batch-invariant-dead-code
wentao-cutlass-fp8-batch-invariance
wentao-dcp-support-for-v2
wentao-enable-flashinfer-moe-fp4-by-default
wentao-epd-support-for-MRv2
wentao-feature-local-external-dp
wentao-fix-amd-ci-test-others-bug
wentao-fix-ci-batch-invariant-issue
wentao-fix-ci-destroy
wentao-fix-compile-warning
wentao-fix-dcp-IMA-for-v2
wentao-fix-kimi-dtype-issue
wentao-fix-mrv2-logprob-dtype-issue
wentao-fix-python-install-ci-error
wentao-fix-qwen3vl-launch-bug
wentao-fix-torch-compile-issue
wentao-fix-v2-is_prefiliing
wentao-model-runner-v2-support-stock-torch-compile
wentao-optimize-async-scheduling-copy
wentao-optimize-dcp-and-add-comm-func
wentao-optimize-model-runner-v2-prepare_inputs
wentao-optimize-model-runner-v2-sampler
wentao-optimize-pooling-by-ragged-tensor
wentao-optimize-pooling-forward
wentao-optimize-sampled-token-ids
wentao-oracle-model-runner-v2
wentao-prefer-sysmem-comm
wentao-refactor-nixl-util
wentao-remove-dead-code
wentao-remove-dead-code-2
wentao-skip-work-when-empty
wentao-sp-support-for-v2
wentao-update-batch-invariant-docstring
wentao-update-torch-to-2.9.1
whisper-translate
wide_ep_working_branch
wide_ep_working_branch_2
wna16-modular-kernel
woosuk/ds-exp
woosuk/ds-exp-2
woosuk/ds-exp-ag
woosuk/dsv4-mrv2-fix-claude
woosuk/fast-topk
woosuk/kimi-exp
woosuk/mrv2-cudagraph-attn-fix
woosuk/mrv2-expert-indices
woosuk/router-nixl
woosuk/test-router
woosuk-jf
wye-refactor-w8a8-quant
zhuohan/moe-kernel-experiment
zhuohan/redundant-pooling-check
zhuohan/remove-redundant-argument
zhuohan/remove-unnecessary-instance_id-setup
zhuohan/remove-virtual-engine
zhuohan/revert-26709
fix
tlrmchlsmth
committed
274 days ago
ddb65dad
Remove mamba-ssm package
tlrmchlsmth
committed
274 days ago
c41ea526
[gpt-oss] Enhance error msg on attention sink init (#22335)
zyongye
committed
274 days ago
Verified
31f5dc5b
[gpt-oss] Add loop for built-in tool call (#22374)
WoosukKwon
committed
274 days ago
Verified
ec7cb192
[Bugfix] Make condition in triton kernel constexpr (#22370)
gshtras
committed
274 days ago
Verified
2435ea7e
[BugFix] Fix triton compile error in `kernel_unified_attention_2/3d` caused by attention sinks (#22368)
LucasWilkinson
committed
274 days ago
Verified
4a6b72c2
add the codes to check AMD Instinct GPU number (#22367)
zhangnju
committed
275 days ago
Verified
b4b9813b
[BugFix] Fix FA2 RuntimeError when sinks is provided (#22365)
LucasWilkinson
committed
275 days ago
Verified
2cb6ef89
[Minor] Fix type (#22347)
WoosukKwon
committed
275 days ago
Verified
9edd1db0
[gpt-oss] Support chat completion api (#22342)
WoosukKwon
committed
275 days ago
Verified
f263a4b5
[gpt-oss] add model to supported models doc (#22336)
Roger Wang
committed
275 days ago
Verified
54991c54
[gpt-oss] Add Tool/ConversationContext classes and harmony_utils (#22340)
WoosukKwon
committed
275 days ago
Verified
178d03fb
[Misc] Clean up duplicated hf overrides (#22311)
Isotr0py
committed
275 days ago
Verified
fa00c5d7
[gpt-oss] Add openai-harmony as default dependency (#22332)
WoosukKwon
committed
275 days ago
Verified
134a8ee8
[gpt-oss] flashinfer attention sink init (#22330)
zyongye
committed
275 days ago
Verified
90ec0069
[GptOss] Add GptOss reasoning parser to support structure output (#22322)
heheda12345
committed
275 days ago
Verified
a47e6ffe
[ROCm] Add attention sink to use_rocm_custom_paged_attention (#22329)
WoosukKwon
committed
275 days ago
Verified
98a3a810
Add GPT-OSS model code and config [1/N] (#22327)
WoosukKwon
committed
275 days ago
Verified
de98252f
Update transformers to `v4.55` (#21931)
hmellor
committed
275 days ago
Verified
796bae07
Add attention sink in attention backends (#22320)
WoosukKwon
committed
275 days ago
Verified
6e209243
Increase openai-python version (#22316)
WoosukKwon
committed
275 days ago
Verified
dd16bdc7
Upgrade FA3 for attention sink (#22313)
WoosukKwon
committed
275 days ago
Verified
e3c876dc
[Bugfix][CI/Build][ROCm] Make sure to use the headers from the build folder on ROCm (#22264)
gshtras
committed
275 days ago
Verified
5d5d419c
[Bugfix] Skip dead and non-GPU nodes for Ray DP engine allocation (#22275)
ruisearch42
committed
275 days ago
Verified
302962e8
[Perf] Parallelize fill_bitmask to accelerate high-throughput guided decoding (#21862)
benchislett
committed
275 days ago
Verified
7e6544c7
[Bugfix] Fix MoE BNB version (#22260)
jeejeelee
committed
275 days ago
Verified
8e6c7e87
[Bugfix] Fix 3D input passed into cutlass_scaled_mm (#22278)
mgoin
committed
275 days ago
Verified
6a515304
[Bugfix] Remove faulty test for oot attention backend (#22286)
mgoin
committed
275 days ago
Verified
35509fc5
[CI][TPU] Fix docker clean up (#22271)
lsy323
committed
275 days ago
Verified
4b29d278
[bugfix] fix blackwell deepep installation (#22255)
youkaichao
committed
275 days ago
Verified
59a0b855
Older