Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
vllm-project/vllm
Pull Requests
Commits
minus_x
0.17.0take2
7snzwi-codex/change-default-logging-behavior
acc-rate
add-cuda-12.8-wheel
amd_dev
amd_mori
amd-ci
andy-neuma-ibm-smoke
batched_triton_fallback
bench-latency
benchmark_serving_test
bind_kv_caches
build-flashinfer-aot-wheel
builder-cuda-version
builder-nvcc-toolchain
bump_numba
ci/macos-arm-wheel
claude/nervous-meitner
claude/optimize-weight-loading-7FlLd
claude/refactor-cmake-includes-XK2Xl
claude/review-vllm-quantization-rfc-cGHDF
codex/add-auto-max-model-length-setting
codex/add-pandas-and-datasets-to-requirements
codex/change-default-logging-behavior
codex/remove-raydistributedexecutor-from-v0-engine
codex/remove-vllm-v0-engine-references-from-docs
codex/update-arch-overview-md-with-vllm-v1-details
consolidate-awq-into-awq-marlin
convert-deepseek-tests-to-b200
copilot/add-sp-min-token-to-e2e-tests
copilot/fix-31e676e9-a4af-4ed2-b74d-19d27f0a57b2
copilot/fix-584be906-f283-4e17-8776-c14111357ee7
copilot/fix-56244f30-e76a-41ed-beaf-3bc9de22a2c9
copilot/fix-870996da-9146-438e-9a52-cdc6c1743086
copilot/fix-c6914add-1b66-46d0-9948-c2e7b6f2259f
copilot/fix-cudagraph-flag-combination
correct-docs-cuda-version
cuda-toolchain-override
cursor/VLLM-94-usage-stats-v2-design-584f
cursor/main-branch-failure-triage-f8d5
cursor/test-quality-improvements-eeea
dbo-cudagraph-size-cherry
deep_full_cudagraph_fix
deepep_tweaks
deepseek_optimizations_alex_rob
dependabot/github_actions/actions/checkout-6.0.2
dependabot/github_actions/actions/setup-python-6.2.0
dependabot/github_actions/actions/stale-10.2.0
disable-mxfp4a16-lm-eval
disable-sd
dockerfile-nvcc-compress
dockerignore_deps
downgrade-cuda-12.8
feat-k2.5-support
fix_ds_eagle
fix/eplb-balancedness-metric
fix/eplb-debug-logging
fix/eplb-nvfp4-modelopt
fix/eplb-prometheus-metrics
fix_fi_cutlass
fix_moe_test_flashinfer
fix_nixl_get_finished_handshake_failure
fix_nixl_triton_attn
fix/rmsnorm-gated-activation
fix_use_ep
fix-aiter-mixtral
fix-dg-warmup
fix-doc-build
fix-hashing-partial-blocks
fix-hybrid-kvcache-manager
fix-mtp
fix-mtp-dummy-run-assertion
fix-nonstream-reasoning
fix-pixtral-lora
fix-precommit
fp8_ep_dp
full_cudagraph
gb200-0317
gemma3n-mm
ghsa-mcmc-2m55-j8jj
gptq-consolidation
gpu_ids2
gpu-ids
il_tool
integrate_aiter_batched_deepgemm
integrate-deepgemm-cmake
jax-tpu
kernel-block-size-alignment-ssm
khluu/cherrypick37322
khluu/disable_h200_x8
khluu/feb11
khluu/glm5
khluu/h200
khluu/mig
khluu/mig-small-model-swaps
khluu/releases/v0.16.0
khluu/test_ami
khluu/2/releases/v0.16.0
khluu-patch-1
lease-refresh
low_latency_opt
lucas/sparse-indexer-logits-budget
luka/fix-rms-quant-non-contiguous
luka/vllm-ir/rms-norm
luka/vllm-ir/rms-norm-batch-invariant
luka/vllm-ir/rms-norm-inplace
lwilkinson/cg-support
lwilkinson/dbo-full-cudagraphs
lwilkinson/eagle-piecewise
lwilkinson/fix-glm-5-mtp-more-then-1
lwilkinson/potential-cutlass-mla-fix
lwilkinson/refactor-cmake
main
mamba_tests
marlin_gptoss_swiglu
maybe_fix_hang_2
mergify/houseroad/config-update
minus_x
mk-init-refactor-poc
mla_cuda_graphs
mla_decode_any_head
mla-support-awq-marlin
moondream2
move-gpt-oss-triton-moe-to-experts
move-nixl-mori-pf-to-prepare-finalize
mrv2-ci-test
multi-api-server-frontend
openai226
optimize-prefix-caching-scheduling
overlap-context-manager
overlap-workspace-fill-stream
pcp-alt
pd_scheduling
pil_image
prometheus-cudagraph-pct
qwen3_5_fp8
qwen25vl
rebased_fi_moe
reduce_scatter_comm
refactor/move-deep-gemm-moe-to-experts
refactor-modelopt-fp8-modular-kernel
release
releases/v0.9.0
releases/v0.9.1
releases/v0.9.2
releases/v0.10.0
releases/v0.10.1
releases/v0.10.2
releases/v0.11.0
releases/v0.11.1
releases/v0.11.2
releases/v0.12.0
releases/v0.13.0
releases/v0.14.0
releases/v0.14.1
releases/v0.15.0
releases/v0.15.1
releases/v0.16.0
releases/v0.17.0
releases/v0.17.1
releases/v0.18.0
releases/v0.18.1
remove_mamba_ssm
remove_naive_all2all
remove-experts-int8
remove-fp4-moe-env-var-clean
remove-gptq-gemm
remove-petit-nvfp4
revert-21550-chengji/fix-ci
revert-22299-main
revert-25205-remote/serialize-inductor
revert-26740-wentao-optimize-startup-log-2
revert-27600-torch-utils-import
revert-29385-eplb_nightly_ci
revert-32344-moe-runner-0
rocm_silu_mul_quant
running-deque
seemethere/cuda_arm64
simon-mo-patch-1
skip-lmfe-tests
sm103
split_kv_cache_init
support_global_dp_logging
test-debug-lb
tms/distributed_timeout
tms/fix-nan
tms/nvfp4-nan-contamination-test
topk_id_hack
torch_dynamo
tpu_v1_optimized
tpu_v1
update_from_kv_xfer_finished_race_fix
use-uv-python-for-docker
v0.8.0
v0.8.1
v0.8.2
v0.8.3
v0.8.4
v0.8.5
v0.16.0-before210
v0.16.0-cu128
v0.16.0-torch291
v1-sched-interface-2
v1_fix_profiler
vadim/qwen35-no-deppgemm
verbose-prime-rl-ci
vllm-dashboard
wentao-dcp-support-for-v2
wentao-enable-flashinfer-moe-fp4-by-default
wentao-epd-support-for-MRv2
wentao-fix-amd-ci-test-others-bug
wentao-fix-ci-batch-invariant-issue
wentao-fix-dcp-IMA-for-v2
wentao-fix-python-install-ci-error
wentao-fix-qwen3vl-launch-bug
wentao-fix-qwen3.5-batch-invariant
wentao-fix-torch-compile-issue
wentao-optimize-async-scheduling-copy
wentao-optimize-model-runner-v2-prepare_inputs
wentao-optimize-model-runner-v2-sampler
wentao-optimize-sampled-token-ids
wentao-prefer-sysmem-comm
wentao-remove-dead-code-in-model-runner
wentao-remove-redundant-prompt-copy
wentao-skip-work-when-empty
wentao-sp-support-for-v2
wentao-update-torch-to-2.9.1
whisper-translate
wide_ep_working_branch
wide_ep_working_branch_2
wna16-modular-kernel
woosuk/ds-exp
woosuk/flashinfer-swa
woosuk/mrv2-cudagraph-attn-fix
woosuk/mrv2-cudagraph-rework
woosuk/mrv2-expert-indices
woosuk/mrv2-pp-full-cudagraph
woosuk/mrv2-slot-map-minor
woosuk/mrv2-whisper
woosuk/remove-req-idx-mapping
woosuk/rm-add-init-env
woosuk/rm-logits-process
woosuk/router-nixl
woosuk/test-router
woosuk/whisper-blackwell
woosuk-jf
wye-refactor-w8a8-quant
zhuohan/moe-kernel-experiment
zhuohan/redundant-pooling-check
zhuohan/remove-redundant-argument
zhuohan/remove-unnecessary-instance_id-setup
zhuohan/remove-virtual-engine
zhuohan/revert-26709
Remove executable flag on a few files
tlrmchlsmth
committed
269 days ago
f8768f52
[Kernels] MoE refactor (#19636)
bnellnm
committed
269 days ago
Verified
c1909e7e
Documentation update tool_calling: mapping back to function from response (#20373)
cronoik-inceptionai
committed
269 days ago
Verified
b9587750
[Model] Adds support for SlimMoE models Phi-tiny-MoE-instruct (#20286)
zichongli5
committed
269 days ago
Verified
706ff132
[Bugfix] Fix the max_seq_len limit of 16384 for DeepSeek models (#20322)
huaqiangwang
committed
269 days ago
Verified
ccbfb1d1
[NVIDIA] Support Cutlass w8a8 FP8 for Blackwell Geforce GPUs (sm120) (#17280)
kaln27
committed
269 days ago
Verified
9e5552aa
[Build/CI] Automatically tag DeepSeek related PRs (#20370)
houseroad
committed
269 days ago
Verified
0c600b9a
[Model] Add Ernie4.5 and Ernie4.5MoE Model Support (#20220)
CSWYF3634076
committed
270 days ago
Verified
e303dcf5
[Docs] Make TPU ref prettier in google_tpu.md (#20356)
windsonsea
committed
270 days ago
Verified
ae9c4d41
[Docs] Fix indentations for 2-level items in deprecation_policy.md (#20352)
windsonsea
committed
270 days ago
Verified
d853520b
[Bugfix] Keye-VL compatibility with `tok_kwargs` (#20058) (#20353)
DarkLight1337
committed
270 days ago
Verified
ba51aea6
[Model][VLM] Support Keye-VL-8B-Preview (#20126)
Kwai-Keye
committed
270 days ago
Verified
8452946c
[Frontend] Support configurable mm placeholder strings & flexible video sampling policies via CLI flags. (#20105)
huachenheli
committed
270 days ago
Verified
2e7cbf2d
[TPU] kv cache update kernel supports dynamic grid (#20235)
Chengji Yao
committed
270 days ago
Verified
7da296be
[Doc][TPU] Add models and features supporting matrix. (#20230)
QiliangCui
committed
270 days ago
Verified
b205e846
fix[Docs]: link anchor is incorrect #20309 (#20315)
yyzxw
committed
270 days ago
Verified
be0cfb2b
[Bugfix] Fix dynamic rotary embedding (#20343)
DarkLight1337
committed
270 days ago
Verified
1a03dd49
[FIX][Intel GPU]fix ipex flash_attn_varlen_func api missing parameter (#20348)
jikunshang
committed
270 days ago
Verified
27b80176
[Misc][Doc] Add missing comment for LLM (#20285)
draftbk
committed
270 days ago
Verified
9ec1e306
[Refactor] Remove Unused Env `VLLM_ENABLE_MOE_ALIGN_BLOCK_SIZE_TRITON` (#20334)
yewentao256
committed
270 days ago
Verified
9dae7d46
[Refactor] Remove duplicate `find_free_port` (#20333)
yewentao256
committed
270 days ago
Verified
7058d7dd
[UT][intel GPU] use current_platform instead of device hardcode in v1 tests (#20169)
Liangliang-Ma
committed
270 days ago
Verified
a0389e05
[Kernel][Bugfix] Fixup some warnings in nvfp4_blockwise_moe when CUDA < 12.8 (#20324)
tlrmchlsmth
committed
270 days ago
Verified
3be8d312
Enable group size 64 for Machete (#20290)
czhu-cohere
committed
270 days ago
Verified
3abfe221
[Refactor] Refactor import utils (#20269)
yewentao256
committed
270 days ago
Verified
e81fbefe
remove unused variables in marlin_template.h (#20236)
zhoutianzi666
committed
270 days ago
Verified
9290de56
[Optimization] Cache sampled token ids in model runner (#20291)
WoosukKwon
committed
270 days ago
Verified
7f280d69
[V1] [ROCm] Enable EP with AITER Fused MoE (#20270)
tjtanaa
committed
270 days ago
Verified
02cabff2
[Frontend] Expand tools even if tool_choice="none" (#17177)
okdshin
committed
270 days ago
Verified
3d19d47d
[CUDA graphs] Enable full cuda graphs with FA3 AoT scheduling (#20301)
WoosukKwon
committed
270 days ago
Verified
8acb4bad
Older