Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
vllm-project/vllm
Pull Requests
Commits
refactor-modelopt-fp8-modular-kernel
0.17.0take2
7snzwi-codex/change-default-logging-behavior
acc-rate
add-cuda-12.8-wheel
amd_dev
amd_mori
amd-ci
andy-neuma-ibm-smoke
batched_triton_fallback
bench-latency
benchmark_serving_test
bind_kv_caches
build-flashinfer-aot-wheel
builder-cuda-version
builder-nvcc-toolchain
bump_numba
ci/macos-arm-wheel
claude/nervous-meitner
claude/optimize-weight-loading-7FlLd
claude/review-vllm-quantization-rfc-cGHDF
codex/add-auto-max-model-length-setting
codex/add-pandas-and-datasets-to-requirements
codex/change-default-logging-behavior
codex/remove-raydistributedexecutor-from-v0-engine
codex/remove-vllm-v0-engine-references-from-docs
codex/update-arch-overview-md-with-vllm-v1-details
convert-deepseek-tests-to-b200
copilot/fix-31e676e9-a4af-4ed2-b74d-19d27f0a57b2
copilot/fix-584be906-f283-4e17-8776-c14111357ee7
copilot/fix-56244f30-e76a-41ed-beaf-3bc9de22a2c9
copilot/fix-870996da-9146-438e-9a52-cdc6c1743086
copilot/fix-c6914add-1b66-46d0-9948-c2e7b6f2259f
copilot/fix-cudagraph-flag-combination
correct-docs-cuda-version
cuda-toolchain-override
cursor/VLLM-94-usage-stats-v2-design-584f
cursor/main-branch-failure-triage-f8d5
dbo-cudagraph-size-cherry
deep_full_cudagraph_fix
deepep_tweaks
deepseek_optimizations_alex_rob
dependabot/github_actions/actions/checkout-6.0.2
dependabot/github_actions/actions/setup-python-6.2.0
dependabot/github_actions/actions/stale-10.2.0
disable-mxfp4a16-lm-eval
disable-sd
dockerfile-nvcc-compress
dockerignore_deps
downgrade-cuda-12.8
feat-k2.5-support
fix_ds_eagle
fix/eplb-balancedness-metric
fix/eplb-debug-logging
fix/eplb-nvfp4-modelopt
fix/eplb-prometheus-metrics
fix_fi_cutlass
fix_moe_test_flashinfer
fix_nixl_get_finished_handshake_failure
fix_nixl_triton_attn
fix/rmsnorm-gated-activation
fix_use_ep
fix-aiter-mixtral
fix-dg-warmup
fix-doc-build
fix-hashing-partial-blocks
fix-hybrid-kvcache-manager
fix-mtp
fix-mtp-dummy-run-assertion
fix-nonstream-reasoning
fix-nvfp4-e-score-bias-v2
fix-pixtral-lora
fix-precommit
fp8_ep_dp
full_cudagraph
gb200-0317
gemma3n-mm
ghsa-mcmc-2m55-j8jj
gpu_ids2
gpu-ids
hybridssm-tests
il_tool
integrate_aiter_batched_deepgemm
jax-tpu
kernel-block-size-alignment-ssm
khluu/cherrypick37322
khluu/disable_h200_x8
khluu/feb11
khluu/glm5
khluu/h200
khluu/releases/v0.16.0
khluu/test_ami
khluu/2/releases/v0.16.0
khluu-patch-1
lease-refresh
low_latency_opt
lucas/sparse-indexer-logits-budget
luka/fix-rms-quant-non-contiguous
luka/vllm-ir/rms-norm
luka/vllm-ir/rms-norm-batch-invariant
luka/vllm-ir/rms-norm-inplace
lwilkinson/cg-support
lwilkinson/dbo-full-cudagraphs
lwilkinson/eagle-piecewise
lwilkinson/fix-glm-5-mtp-more-then-1
lwilkinson/potential-cutlass-mla-fix
lwilkinson/refactor-cmake
main
mamba_tests
marlin_gptoss_swiglu
maybe_fix_hang_2
mergify/houseroad/config-update
minus_x
mk-init-refactor-poc
mla_cuda_graphs
mla_decode_any_head
mla-support-awq-marlin
moondream2
mrv2-ci-test
multi-api-server-frontend
openai226
optimize-prefix-caching-scheduling
overlap-context-manager
overlap-workspace-fill-stream
pcp-alt
pd_scheduling
pil_image
prometheus-cudagraph-pct
qwen3_5_fp8
qwen25vl
rebased_fi_moe
reduce_scatter_comm
refactor/migrate-prepare-finalize-to-subfolder
refactor-modelopt-fp8-modular-kernel
release
releases/v0.9.0
releases/v0.9.1
releases/v0.9.2
releases/v0.10.0
releases/v0.10.1
releases/v0.10.2
releases/v0.11.0
releases/v0.11.1
releases/v0.11.2
releases/v0.12.0
releases/v0.13.0
releases/v0.14.0
releases/v0.14.1
releases/v0.15.0
releases/v0.15.1
releases/v0.16.0
releases/v0.17.0
releases/v0.17.1
releases/v0.18.0
remove_mamba_ssm
remove_naive_all2all
remove-experts-int8
remove-fp4-moe-env-var-clean
remove-petit-nvfp4
remove-ptpc-fp8
revert-21550-chengji/fix-ci
revert-22299-main
revert-25205-remote/serialize-inductor
revert-26740-wentao-optimize-startup-log-2
revert-27600-torch-utils-import
revert-29385-eplb_nightly_ci
revert-32344-moe-runner-0
rocm_silu_mul_quant
running-deque
seemethere/cuda_arm64
simon-mo-patch-1
skip-lmfe-tests
split_kv_cache_init
support_global_dp_logging
test-debug-lb
tms/distributed_timeout
tms/nvfp4-nan-contamination-test
topk_id_hack
torch_dynamo
tpu_v1_optimized
tpu_v1
update_from_kv_xfer_finished_race_fix
use-uv-python-for-docker
v0.8.0
v0.8.1
v0.8.2
v0.8.3
v0.8.4
v0.8.5
v0.16.0-before210
v0.16.0-cu128
v0.16.0-torch291
v1-sched-interface-2
v1_fix_profiler
verbose-prime-rl-ci
vllm-dashboard
wentao-batch-invariant-test-skip-MLA
wentao-dcp-support-for-v2
wentao-enable-flashinfer-moe-fp4-by-default
wentao-eplb-support-for-v2
wentao-fix-amd-ci-test-others-bug
wentao-fix-dcp-IMA-for-v2
wentao-fix-fp8-deepgemm-batch-invariant
wentao-fix-python-install-ci-error
wentao-fix-qwen3vl-launch-bug
wentao-fix-torch-compile-issue
wentao-kv_cache-no-list
wentao-optimize-async-scheduling-copy
wentao-optimize-model-runner-v2-prepare_inputs
wentao-optimize-model-runner-v2-sampler
wentao-optimize-sampled-token-ids
wentao-prefer-sysmem-comm
wentao-sp-support-for-v2
wentao-update-torch-to-2.9.1
whisper-translate
wide_ep_working_branch
wide_ep_working_branch_2
wna16-modular-kernel
woosuk/flashinfer-swa
woosuk/mrv2-cudagraph-attn-fix
woosuk/mrv2-cudagraph-rework
woosuk/mrv2-slot-map-minor
woosuk/mrv2-whisper
woosuk/remove-req-idx-mapping
woosuk/rm-add-init-env
woosuk/router-nixl
woosuk/test-router
woosuk/whisper-blackwell
woosuk-jf
wye-refactor-w8a8-quant
zhuohan/moe-kernel-experiment
zhuohan/redundant-pooling-check
zhuohan/remove-redundant-argument
zhuohan/remove-unnecessary-instance_id-setup
zhuohan/remove-virtual-engine
zhuohan/revert-26709
Refactor ModelOptFp8MoEMethod to use modular kernels
Robert Shaw
committed
90 days ago
5284a65b
progress towards single interface
Robert Shaw
committed
92 days ago
fc6fa84b
progress towards single interface
Robert Shaw
committed
92 days ago
453213f6
updated to use workspaces
Robert Shaw
committed
93 days ago
919d679e
initial commit
Robert Shaw
committed
93 days ago
2b175179
[ROCm] Serving Fails on Radeon Due to AITER Dtype Import (#30952)
vllmellm
committed
93 days ago
Verified
96bf50a2
[Bugfix][CPU] Fix Mac CPU build (#30955)
bigPYJ1151
committed
93 days ago
Verified
f90d3636
[moe] Use enable_chunking func (to support disabling chunking) (#29935)
minosfuture
committed
93 days ago
Verified
8372be28
[ROCm][Bugfix] Fix `fa_version` argument error in `flash_attn_maxseqlen_wrapper` for ROCm without aiter (#30909)
AndreasKaratzas
committed
93 days ago
Verified
8da6ae49
[BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG support) (#30910)
LucasWilkinson
committed
93 days ago
Verified
30bb19a7
[Bugfix] Fix Unicode issues in GLM-4 tool calling (#30920)
chaunceyjiang
committed
93 days ago
Verified
aa7e8360
[ROCm][Bugfix] fix(structured_output): Skip guidance backend for schemas with patternProperties (#30730)
AndreasKaratzas
committed
93 days ago
Verified
be2ad5f9
[Platform] Let EPD work with non-cuda platform (#30225)
wangxiyuan
committed
93 days ago
Verified
a85724bd
[Fix][FlexAttention] return max logical block index to handle reused blocks (#30915)
ivanium
committed
93 days ago
Verified
11a89cf9
[CPU] Refactor CPU fused MOE (#30531)
bigPYJ1151
committed
93 days ago
Verified
e3ab93c8
fix: add warmup for audio preprocessing (#30706)
TheCodeWrangler
committed
93 days ago
Verified
fc2ae6d6
[KV connector][LMCache] Only record the cuda event when there are request to store/load (#30814)
ApostaC
committed
93 days ago
Verified
ec965569
[AMD][CI] fix lm eval ci arg (#30911)
divakar-amd
committed
93 days ago
Verified
82dc338a
[PERF] Qwen3-next. Add fp8 cutlass MoE tuned configs. `chmod -x *MI308X.json` (#29553)
vadiklyutiy
committed
93 days ago
Verified
717ac33d
[Doc][CPU] Update CPU doc (#30765)
bigPYJ1151
committed
93 days ago
Verified
cfb7e555
[refactor] Add prefix support to embed_tokens in DeepSeek MTP (#30788)
zzhx1
committed
93 days ago
Verified
b166ef20
[compile] Fix CI for test_gpt2_cache_hit (#30902)
zhxchen17
committed
93 days ago
Verified
5f2f3fba
[UX] Reduce DeepGEMM warmup log output to single progress bar (#30903)
MatthewBonanni
committed
93 days ago
Verified
4a8412f7
[Quantization] Support Quark int4-fp8 w4a8 for MoE (#30071)
BowenBao
committed
93 days ago
Verified
0c738b58
fused_moe_lora PDL improvements (#30716)
gnovack
committed
93 days ago
Verified
5a3adf58
[Chore] Remove v0 dead code for Qwen2.5-omni (#30883)
Isotr0py
committed
93 days ago
Verified
6fe58876
[NIXL] Support P tensor-parallel-size > D tensor-parallel-size (#27274)
NickLucche
committed
93 days ago
Verified
bc3700e0
[ROCm][CI] Reduce Flakiness For test_async_scheduling Using ROCM_ATTN With FP32 (#30811)
micah-wil
committed
93 days ago
Verified
fd8afdf3
[Metrics] Model FLOPs Utilization estimation (#30738)
SungMinCho
committed
93 days ago
Verified
a0b782f9
[CI][Feature] Adds auto-rebase PR rule (#30875)
rafvasq
committed
93 days ago
Verified
ed2897f3
Older