Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
vllm-project/vllm
Pull Requests
Commits
wna16-modular-kernel
0.17.0take2
7snzwi-codex/change-default-logging-behavior
acc-rate
add-cuda-12.8-wheel
amd_dev
amd_mori
amd-ci
andy-neuma-ibm-smoke
batched_triton_fallback
bench-latency
benchmark_serving_test
bind_kv_caches
build-flashinfer-aot-wheel
builder-cuda-version
builder-nvcc-toolchain
bump_numba
claude/fix-fp4-warning-gAsgS
codex/add-auto-max-model-length-setting
codex/add-pandas-and-datasets-to-requirements
codex/change-default-logging-behavior
codex/remove-raydistributedexecutor-from-v0-engine
codex/remove-vllm-v0-engine-references-from-docs
codex/update-arch-overview-md-with-vllm-v1-details
convert-deepseek-tests-to-b200
copilot/fix-31e676e9-a4af-4ed2-b74d-19d27f0a57b2
copilot/fix-584be906-f283-4e17-8776-c14111357ee7
copilot/fix-56244f30-e76a-41ed-beaf-3bc9de22a2c9
copilot/fix-870996da-9146-438e-9a52-cdc6c1743086
copilot/fix-c6914add-1b66-46d0-9948-c2e7b6f2259f
copilot/fix-cudagraph-flag-combination
correct-docs-cuda-version
cuda-toolchain-override
cursor/VLLM-94-usage-stats-v2-design-584f
cursor/main-branch-failure-triage-f8d5
dbo-cudagraph-size-cherry
deep_full_cudagraph_fix
deepep_tweaks
deepseek_optimizations_alex_rob
dependabot/github_actions/actions/checkout-6.0.2
dependabot/github_actions/actions/setup-python-6.2.0
dependabot/github_actions/actions/stale-10.2.0
disable-mxfp4a16-lm-eval
disable-sd
dockerfile-nvcc-compress
dockerignore_deps
downgrade-cuda-12.8
feat-k2.5-support
fix_ds_eagle
fix/eplb-balancedness-metric
fix/eplb-debug-logging
fix/eplb-nvfp4-modelopt
fix/eplb-prometheus-metrics
fix_fi_cutlass
fix_moe_test_flashinfer
fix_nixl_triton_attn
fix/rmsnorm-gated-activation
fix_use_ep
fix-aiter-mixtral
fix-dg-warmup
fix-doc-build
fix-hashing-partial-blocks
fix-hybrid-kvcache-manager
fix-mtp
fix-mtp-dummy-run-assertion
fix-nonstream-reasoning
fix-nvfp4-e-score-bias-v2
fix-precommit
fp8_ep_dp
full_cudagraph
gemma3n-mm
ghsa-mcmc-2m55-j8jj
gpu_ids2
gpu-ids
il_tool
integrate_aiter_batched_deepgemm
jax-tpu
khluu/disable_h200_x8
khluu/feb11
khluu/glm5
khluu/h200
khluu/releases/v0.16.0
khluu/test_ami
khluu/2/releases/v0.16.0
khluu-patch-1
lease-refresh
low_latency_opt
lucas/sparse-indexer-logits-budget
luka/fix-rms-quant-non-contiguous
luka/vllm-ir/rms-norm
luka/vllm-ir/rms-norm-batch-invariant
luka/vllm-ir/rms-norm-inplace
lwilkinson/cg-support
lwilkinson/dbo-full-cudagraphs
lwilkinson/eagle-piecewise
lwilkinson/fix-glm-5-mtp-more-then-1
lwilkinson/potential-cutlass-mla-fix
lwilkinson/refactor-cmake
main
mamba_tests
marlin_gptoss_swiglu
maybe_fix_hang_2
mergify/houseroad/config-update
minus_x
mk-init-refactor-poc
mla_cuda_graphs
mla_decode_any_head
mla-support-awq-marlin
moondream2
mrv2-ci-test
multi-api-server-frontend
openai226
optimize-prefix-caching-scheduling
overlap-context-manager
overlap-workspace-fill-stream
pcp-alt
pd_scheduling
pil_image
qwen3_5_fp8
qwen25vl
rebased_fi_moe
reduce_scatter_comm
refactor/migrate-prepare-finalize-to-subfolder
refactor-modelopt-fp8-modular-kernel
release
releases/v0.9.0
releases/v0.9.1
releases/v0.9.2
releases/v0.10.0
releases/v0.10.1
releases/v0.10.2
releases/v0.11.0
releases/v0.11.1
releases/v0.11.2
releases/v0.12.0
releases/v0.13.0
releases/v0.14.0
releases/v0.14.1
releases/v0.15.0
releases/v0.15.1
releases/v0.16.0
releases/v0.17.0
releases/v0.17.1
remove_mamba_ssm
remove_naive_all2all
remove-experts-int8
remove-petit-nvfp4
remove-ptpc-fp8
revert-21550-chengji/fix-ci
revert-22299-main
revert-25205-remote/serialize-inductor
revert-26740-wentao-optimize-startup-log-2
revert-27600-torch-utils-import
revert-29385-eplb_nightly_ci
revert-32344-moe-runner-0
rocm_silu_mul_quant
running-deque
seemethere/cuda_arm64
simon-mo-patch-1
skip-lmfe-tests
split_kv_cache_init
support_global_dp_logging
test-debug-lb
tms/distributed_timeout
topk_id_hack
torch_dynamo
tpu_v1_optimized
tpu_v1
update_from_kv_xfer_finished_race_fix
use-uv-python-for-docker
v0.8.0
v0.8.1
v0.8.2
v0.8.3
v0.8.4
v0.8.5
v0.16.0-before210
v0.16.0-cu128
v0.16.0-torch291
v1-sched-interface-2
v1_fix_profiler
verbose-prime-rl-ci
vllm-dashboard
wentao-dcp-support-for-v2
wentao-enable-flashinfer-moe-fp4-by-default
wentao-fix-amd-ci-test-others-bug
wentao-fix-compile-warning-moe-permute
wentao-fix-compile-warning-vec
wentao-fix-dcp-IMA-for-v2
wentao-fix-flashinfer-all-reduce-hang-issue
wentao-fix-python-install-ci-error
wentao-fix-qwen3vl-launch-bug
wentao-fix-torch-compile-issue
wentao-optimize-async-scheduling-copy
wentao-optimize-model-runner-v2-prepare_inputs
wentao-optimize-model-runner-v2-sampler
wentao-optimize-sampled-token-ids
wentao-prefer-sysmem-comm
wentao-remove-unused-code
wentao-sp-support-for-v2
wentao-update-torch-to-2.9.1
whisper-translate
wide_ep_working_branch
wide_ep_working_branch_2
wna16-modular-kernel
woosuk/flashinfer-swa
woosuk/mrv2-cudagraph-attn-fix
woosuk/mrv2-cudagraph-rework
woosuk/mrv2-slot-map-minor
woosuk/mrv2-whisper
woosuk/remove-req-idx-mapping
woosuk/rm-add-init-env
woosuk/router-nixl
woosuk/test-router
woosuk/whisper-blackwell
woosuk-jf
wye-refactor-w8a8-quant
zhuohan/moe-kernel-experiment
zhuohan/redundant-pooling-check
zhuohan/remove-redundant-argument
zhuohan/remove-unnecessary-instance_id-setup
zhuohan/remove-virtual-engine
zhuohan/revert-26709
fix pre-commit
Robert Shaw
committed
43 days ago
3bf491b5
next step: add support for scalartype to quantkey
Robert Shaw
committed
43 days ago
7b5b8923
making progress
Robert Shaw
committed
43 days ago
8a31a6f2
update for triton
Robert Shaw
committed
43 days ago
b264a674
update for marlin
Robert Shaw
committed
43 days ago
bc41d5dc
create mixed input file
Robert Shaw
committed
43 days ago
c2b99f53
Merge remote-tracking branch 'upstream/main' into wna16-modular-kernel
Robert Shaw
committed
43 days ago
808a0e1a
[CI][Pooling] Stabilize ModernBERT test (#32909)
AndreasKaratzas
committed
43 days ago
Verified
6c006457
[code clean] remove duplicate code (#33135)
andyxning
committed
43 days ago
Verified
b781eeaa
[Frontend] Cleanup serving engine (#33103)
DarkLight1337
committed
44 days ago
Verified
e0b005d9
[torch.compile] Stop assuming 32 bit indexing (#33113)
zou3519
committed
44 days ago
Verified
3b8f0fe5
[Frontend] Reduce mixin usage in serving pooling (#33101)
DarkLight1337
committed
44 days ago
Verified
c831911b
[Perf] avoid duplicate mem_get_info() call in get_current_memory_usage (#33064)
pacoxu
committed
44 days ago
Verified
157caf51
[DOC]: Add warning about max_num_batched_tokens and max_model_len when chunked prefill is disabled (#33109)
VincentG1234
committed
44 days ago
Verified
0b53bec6
Fix IndexError with encoder-decoder models when using Custom Paged Attention (#33112)
sstamenk
committed
44 days ago
Verified
c568581f
fix: preserve native tool call ID in multi-turn tool calling (#32768)
wangln19
committed
44 days ago
Verified
2d705343
[MoE Refactor] Integrate Naive Prepare Finalize into MK (#32567)
robertgshaw2-redhat
committed
44 days ago
Verified
5a93b916
[Model Runner V2] Remove UvaBufferPool for cpu->gpu copy (#33055)
WoosukKwon
committed
44 days ago
Verified
6d86fde0
inital commit
Robert Shaw
committed
44 days ago
c217f287
[Bugfix][TPU] Return a Default fp8 MoE Backend (#32908)
vanbasten23
committed
44 days ago
Verified
510ed1e8
[Bugfix][MXFP4] Call `trtllm_fp4_block_scale_moe` with kwargs (#33104)
wpc
committed
44 days ago
Verified
8caffd92
[fix] CPUDNNLGEMMHandler pointer baked into inductor artifact (#32913)
dolpm
committed
44 days ago
Verified
58a05b0c
[Logging] add `--disable-access-log-for-endpoints` CLI option (#30011)
JaredforReal
committed
44 days ago
Verified
6ee7f18f
[Refactor] Remove unused `_moe_permute` function (#33108)
yewentao256
committed
44 days ago
Verified
8f987883
[ci] Sync test areas with test-pipeline.yaml and enable new pipeline generator (#33080)
khluu
committed
44 days ago
Verified
ebe0ba91
[Bugfix] Fix Dtypes for Pynccl Wrapper (#33030)
robertgshaw2-redhat
committed
44 days ago
Verified
43a013c3
[Model] Bump transformers version for test registry (#33100)
DarkLight1337
committed
44 days ago
Verified
c25dbee4
[Bugfix] Fix Voxtral streaming slot_mapping (#33073)
NickLucche
committed
44 days ago
Verified
19ab0f7c
[FIX] Always support TP > 4 for FP4 Gemm (#31099)
danielafrimi
committed
44 days ago
Verified
67fe677c
Remove unused logic in `models/mistral.py` (#33095)
andylolu2
committed
44 days ago
Verified
d56afd45
Older