Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
vllm-project/vllm
Pull Requests
Commits
woosuk/test-router
0.17.0take2
7snzwi-codex/change-default-logging-behavior
acc-rate
add-cuda-12.8-wheel
amd_dev
amd_mori
amd-ci
andy-neuma-ibm-smoke
batched_triton_fallback
bench-latency
benchmark_serving_test
bind_kv_caches
build-flashinfer-aot-wheel
builder-cuda-version
builder-nvcc-toolchain
bump_numba
ci/macos-arm-wheel
claude/nervous-meitner
claude/optimize-weight-loading-7FlLd
claude/review-vllm-quantization-rfc-cGHDF
codex/add-auto-max-model-length-setting
codex/add-pandas-and-datasets-to-requirements
codex/change-default-logging-behavior
codex/remove-raydistributedexecutor-from-v0-engine
codex/remove-vllm-v0-engine-references-from-docs
codex/update-arch-overview-md-with-vllm-v1-details
consolidate-awq-into-awq-marlin
convert-deepseek-tests-to-b200
copilot/fix-31e676e9-a4af-4ed2-b74d-19d27f0a57b2
copilot/fix-584be906-f283-4e17-8776-c14111357ee7
copilot/fix-56244f30-e76a-41ed-beaf-3bc9de22a2c9
copilot/fix-870996da-9146-438e-9a52-cdc6c1743086
copilot/fix-c6914add-1b66-46d0-9948-c2e7b6f2259f
copilot/fix-cudagraph-flag-combination
correct-docs-cuda-version
cuda-toolchain-override
cursor/VLLM-94-usage-stats-v2-design-584f
cursor/main-branch-failure-triage-f8d5
dbo-cudagraph-size-cherry
deep_full_cudagraph_fix
deepep_tweaks
deepseek_optimizations_alex_rob
dependabot/github_actions/actions/checkout-6.0.2
dependabot/github_actions/actions/setup-python-6.2.0
dependabot/github_actions/actions/stale-10.2.0
disable-mxfp4a16-lm-eval
disable-sd
dockerfile-nvcc-compress
dockerignore_deps
downgrade-cuda-12.8
feat-k2.5-support
fix_ds_eagle
fix/eplb-balancedness-metric
fix/eplb-debug-logging
fix/eplb-nvfp4-modelopt
fix/eplb-prometheus-metrics
fix_fi_cutlass
fix_moe_test_flashinfer
fix_nixl_get_finished_handshake_failure
fix_nixl_triton_attn
fix/rmsnorm-gated-activation
fix_use_ep
fix-aiter-mixtral
fix-dg-warmup
fix-doc-build
fix-hashing-partial-blocks
fix-hybrid-kvcache-manager
fix-mtp
fix-mtp-dummy-run-assertion
fix-nonstream-reasoning
fix-pixtral-lora
fix-precommit
fp8_ep_dp
full_cudagraph
gb200-0317
gemma3n-mm
ghsa-mcmc-2m55-j8jj
gptq-consolidation
gpu_ids2
gpu-ids
il_tool
integrate_aiter_batched_deepgemm
jax-tpu
kernel-block-size-alignment-ssm
khluu/cherrypick37322
khluu/cpu_job_downsize
khluu/disable_h200_x8
khluu/feb11
khluu/glm5
khluu/h200
khluu/release_q
khluu/releases/v0.16.0
khluu/test_ami
khluu/2/releases/v0.16.0
khluu-patch-1
lease-refresh
low_latency_opt
lucas/sparse-indexer-logits-budget
luka/fix-rms-quant-non-contiguous
luka/vllm-ir/rms-norm
luka/vllm-ir/rms-norm-batch-invariant
luka/vllm-ir/rms-norm-inplace
lwilkinson/cg-support
lwilkinson/dbo-full-cudagraphs
lwilkinson/eagle-piecewise
lwilkinson/fix-glm-5-mtp-more-then-1
lwilkinson/potential-cutlass-mla-fix
lwilkinson/refactor-cmake
main
mamba_tests
marlin_gptoss_swiglu
maybe_fix_hang_2
mergify/houseroad/config-update
minus_x
mk-init-refactor-poc
mla_cuda_graphs
mla_decode_any_head
mla-support-awq-marlin
moondream2
move-gpt-oss-triton-moe-to-experts
move-nixl-mori-pf-to-prepare-finalize
mrv2-ci-test
multi-api-server-frontend
openai226
optimize-prefix-caching-scheduling
overlap-context-manager
overlap-workspace-fill-stream
pcp-alt
pd_scheduling
pil_image
prometheus-cudagraph-pct
qwen3_5_fp8
qwen25vl
rebased_fi_moe
reduce_scatter_comm
refactor/move-deep-gemm-moe-to-experts
refactor-modelopt-fp8-modular-kernel
release
releases/v0.9.0
releases/v0.9.1
releases/v0.9.2
releases/v0.10.0
releases/v0.10.1
releases/v0.10.2
releases/v0.11.0
releases/v0.11.1
releases/v0.11.2
releases/v0.12.0
releases/v0.13.0
releases/v0.14.0
releases/v0.14.1
releases/v0.15.0
releases/v0.15.1
releases/v0.16.0
releases/v0.17.0
releases/v0.17.1
releases/v0.18.0
remove_mamba_ssm
remove_naive_all2all
remove-experts-int8
remove-fp4-moe-env-var-clean
remove-gptq-gemm
remove-petit-nvfp4
revert-21550-chengji/fix-ci
revert-22299-main
revert-25205-remote/serialize-inductor
revert-26740-wentao-optimize-startup-log-2
revert-27600-torch-utils-import
revert-29385-eplb_nightly_ci
revert-32344-moe-runner-0
rocm_silu_mul_quant
running-deque
seemethere/cuda_arm64
simon-mo-patch-1
skip-lmfe-tests
split_kv_cache_init
support_global_dp_logging
test-debug-lb
tms/distributed_timeout
tms/nvfp4-nan-contamination-test
topk_id_hack
torch_dynamo
tpu_v1_optimized
tpu_v1
update_from_kv_xfer_finished_race_fix
use-uv-python-for-docker
v0.8.0
v0.8.1
v0.8.2
v0.8.3
v0.8.4
v0.8.5
v0.16.0-before210
v0.16.0-cu128
v0.16.0-torch291
v1-sched-interface-2
v1_fix_profiler
verbose-prime-rl-ci
vllm-dashboard
wentao-add-batch-invariant-test
wentao-dcp-support-for-v2
wentao-enable-flashinfer-moe-fp4-by-default
wentao-eplb-support-for-v2
wentao-fix-amd-ci-test-others-bug
wentao-fix-dcp-IMA-for-v2
wentao-fix-python-install-ci-error
wentao-fix-qwen3vl-launch-bug
wentao-fix-torch-compile-issue
wentao-kv_cache-no-list
wentao-optimize-async-scheduling-copy
wentao-optimize-model-runner-v2-prepare_inputs
wentao-optimize-model-runner-v2-sampler
wentao-optimize-sampled-token-ids
wentao-prefer-sysmem-comm
wentao-sp-support-for-v2
wentao-update-torch-to-2.9.1
whisper-translate
wide_ep_working_branch
wide_ep_working_branch_2
wna16-modular-kernel
woosuk/flashinfer-swa
woosuk/mrv2-cudagraph-attn-fix
woosuk/mrv2-cudagraph-rework
woosuk/mrv2-pp-full-cudagraph
woosuk/mrv2-slot-map-minor
woosuk/mrv2-whisper
woosuk/remove-req-idx-mapping
woosuk/rm-add-init-env
woosuk/router-nixl
woosuk/test-router
woosuk/whisper-blackwell
woosuk-jf
wye-refactor-w8a8-quant
zhuohan/moe-kernel-experiment
zhuohan/redundant-pooling-check
zhuohan/remove-redundant-argument
zhuohan/remove-unnecessary-instance_id-setup
zhuohan/remove-virtual-engine
zhuohan/revert-26709
skip detokenize
Woosuk Kwon
committed
151 days ago
cb439737
Turn off usage
Woosuk Kwon
committed
151 days ago
a1cac484
Fix oom
Woosuk Kwon
committed
151 days ago
6102536d
mem
Woosuk Kwon
committed
151 days ago
f65da69c
Fix uv error from tvm-ffi
Woosuk Kwon
committed
153 days ago
a5281395
Remove /generate API
Woosuk Kwon
committed
153 days ago
eda71c28
Add /generate API
Woosuk Kwon
committed
154 days ago
1bff9a59
disable flashinfer warmup
Woosuk Kwon
committed
158 days ago
69c9a015
Merge branch 'main' into woosuk/test-router
Woosuk Kwon
committed
158 days ago
8935ca20
[ci] Adjusting AMD test composition 2025-10-14 (#26852)
Alexei-V-Ivanov-AMD
committed
158 days ago
Verified
938c43ea
Move query quantization to attention layer for Flashinfer & Triton. (#26534)
adabeyta
committed
159 days ago
Verified
0a9ef0cf
[Bug] Temporally Disable `VLLM_ALLREDUCE_USE_SYMM_MEM` by Default (#26925)
yewentao256
committed
159 days ago
Verified
e5b438a2
support flashinfer_fp4 moe for 5090 gpu (#26669)
XiaobingSuper
committed
159 days ago
Verified
0b99f5d3
Vectorize RMS norm variance using vectorize_read_with_alignment (#26234)
bbeckca
committed
159 days ago
Verified
1f491aa0
[NVIDIA] Add support for cudnn fp4 gemm via flashinfer (#26107)
kaixih
committed
159 days ago
Verified
de92d916
[Chore] Clean up CODEOWNERS (#26923)
WoosukKwon
committed
159 days ago
Verified
a1063628
[ModelOpt] Remove NVFP4 MoE K%16==0 constraint (#26891)
XiaobingSuper
committed
159 days ago
Verified
d7963752
[Feature]: Use pydantic validation in observability.py config (#26637)
cern1710
committed
159 days ago
Verified
14f84563
Olmo 3 tool parser and tests (#26143)
pdasigi
committed
159 days ago
Verified
4794c2bd
Lower sevarity of log when model info cache misses due to exception (#26917)
hmellor
committed
159 days ago
Verified
d3cbaa08
[Chore] Separate out `vllm.utils.async_utils` (#26913)
DarkLight1337
committed
159 days ago
Verified
828523ad
[Chore] Separate out `vllm.utils.func` (#26904)
DarkLight1337
committed
159 days ago
Verified
136a17fe
[BugFix] Patch inductor memory plan logic (#26878)
BoyuanFeng
committed
159 days ago
Verified
f5743833
chore: remove unused marker (#26890)
max-wittig
committed
159 days ago
Verified
5d598680
[Misc] rename torch_dtype to dtype (#26695)
wangxiyuan
committed
159 days ago
Verified
8f4b313c
[Misc] Remove `isort` and `yapf` ignores (#26888)
DarkLight1337
committed
159 days ago
Verified
f93e3480
[Model][2/N] Improve all pooling task | Support multi-vector retrieval (#25370)
noooop
committed
159 days ago
Verified
f54f8512
[Lora]Load tuned multi-lora kernel configs from json files (#26319)
li2haipeng
committed
159 days ago
Verified
d4d1a602
[Platform] allow platform to init dp group (#22243)
wangxiyuan
committed
159 days ago
Verified
db1764e4
[Easy] Get rid of unnecessary paraenthesis in kv_cache_manager (#26842)
Jialin
committed
159 days ago
Verified
7f83b4ee
Older