Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
vllm-project/vllm
Pull Requests
Commits
moe-refactor-modelopt-fp8
7snzwi-codex/change-default-logging-behavior
acc-rate
aiter-fp8-mk
amd_dev
amd_mori
amd-ci
andy-neuma-testing
batched_triton_fallback
bench-latency
benchmark_serving_test
bind_kv_caches
build-flashinfer-aot-wheel
codex/add-auto-max-model-length-setting
codex/add-pandas-and-datasets-to-requirements
codex/change-default-logging-behavior
codex/remove-raydistributedexecutor-from-v0-engine
codex/remove-virtual-engine-from-codebase
codex/remove-vllm-v0-engine-references-from-docs
codex/update-arch-overview-md-with-vllm-v1-details
copilot/fix-31e676e9-a4af-4ed2-b74d-19d27f0a57b2
copilot/fix-584be906-f283-4e17-8776-c14111357ee7
copilot/fix-56244f30-e76a-41ed-beaf-3bc9de22a2c9
copilot/fix-870996da-9146-438e-9a52-cdc6c1743086
copilot/fix-c6914add-1b66-46d0-9948-c2e7b6f2259f
copilot/fix-cudagraph-flag-combination
correct-docs-cuda-version
dbo-cudagraph-size-cherry
deep_full_cudagraph_fix
deepep_tweaks
deepseek_optimizations_alex_rob
dependabot/github_actions/actions/checkout-5.0.0
disable-sd
dockerfile-nvcc-compress
fix_ds_eagle
fix_use_ep
fix-doc-build
fix-hashing-partial-blocks
fix-precommit
fp8_ep_dp
full_cudagraph
gemma3n-mm
ghsa-mcmc-2m55-j8jj
gpu_ids2
gpu-ids
il_tool
jax-tpu
kevin_h100
khluu/clean_apt
khluu/nccl
khluu/test_fixed_premerge
khluu/test_latest_feat
khluu/test_pull_through_cache
khluu/test_rebase
khluu/test_us_east_1
khluu/test
khluu/try_moc
khluu/use_ccache_premerge
khluu/0.11.1
khluu/8gpu_h200
khluu-patch-1
low_latency_opt
lwilkinson/cg-support
lwilkinson/dbo-full-cudagraphs
lwilkinson/eagle-piecewise
lwilkinson/potential-cutlass-mla-fix
lwilkinson/refactor-cmake
main
mamba_tests
marlin_gptoss_swiglu
maybe_fix_hang_2
mergify/houseroad/config-update
minus_x
mla_cuda_graphs
mla_decode_any_head
mla-support-awq-marlin
moe-refactor-modelopt-fp8
moondream2
optimize-prefix-caching-scheduling
pd_scheduling
pil_image
qwen25vl
rebased_fi_moe
reduce_scatter_comm
refactor-modelopt-fp8-modular-kernel
releases/v0.9.0
releases/v0.9.1
releases/v0.9.2
releases/v0.10.0
releases/v0.10.1
releases/v0.10.2
releases/v0.11.0
releases/v0.11.1
releases/v0.11.2
releases/v0.12.0
releases/v0.13.0
remove_mamba_ssm
revert-21550-chengji/fix-ci
revert-22299-main
revert-26740-wentao-optimize-startup-log-2
revert-27532-lwilkinson/upconvert-all-2
revert-27600-torch-utils-import
revert-29385-eplb_nightly_ci
running-deque
seemethere/cuda_arm64
simon-mo-patch-1
skip-lmfe-tests
split_kv_cache_init
support_global_dp_logging
test-debug-lb
test-docker-cache
tms/distributed_timeout
topk_id_hack
torch_dynamo
tpu_v1_optimized
tpu_v1
update_from_kv_xfer_finished_race_fix
use-uv-python-for-docker
v0.8.0
v0.8.1
v0.8.2
v0.8.3
v0.8.4
v0.8.5
v1-sched-interface-2
v1_fix_profiler
verbose-prime-rl-ci
wentao-fix-python-install-ci-error
wentao-fix-qwen3vl-launch-bug
wentao-fix-torch-compile-issue
wentao-update-torch-to-2.9.1
whisper-translate
wide_ep_working_branch
wide_ep_working_branch_2
woosuk/fa3-swa-cudagraph
woosuk/flashinfer-swa
woosuk/remove-req-idx-mapping
woosuk/rm-add-init-env
woosuk/router-nixl
woosuk/sampled-token-ids
woosuk/test-router
woosuk/v2-logit-bias
woosuk/v2-penalties
woosuk-jf
wye-refactor-w8a8-quant
zhuohan/moe-kernel-experiment
zhuohan/remove-redundant-argument
zhuohan/remove-virtual-engine
zhuohan/revert-26709
[MoE Refactor][3/N] Use Modular Kernels for ModelOpt FP8
Robert Shaw
committed
7 hours ago
a6fa5113
ci: add nvidia-smi warmup before Prime-RL integration test (#31093)
AmeenP
committed
8 hours ago
Verified
93cabc41
add aarnphm and chaunceyjiang to the new tool_parser directory (#31088)
chaunceyjiang
committed
21 hours ago
Verified
bb80f69b
[BugFix]fix gpt-oss v1/completions response bug (#30608)
princepride
committed
21 hours ago
Verified
3e92b2b7
[Quantization] add marlin w4a8/w8a8 check (#31061)
jinzhen-lin
committed
1 day ago
Verified
7c73ceb5
[CI] Fix H200 Distributed test (#31054)
LucasWilkinson
committed
1 day ago
Verified
ae0770fa
[Quantization] support logical_widths for fp8 marlin (#30962)
jinzhen-lin
committed
1 day ago
Verified
ee52d990
[MoE Refactor][5/N] Isolate zero expert to LongCatFlash (#28891)
baonudesifeizhai
committed
1 day ago
Verified
54c89243
[XPU] enable fp8 online streaming quantization (#30944)
yma11
committed
1 day ago
Verified
560ae963
[Bugfix] Read truncate_prompt_tokens from pooling_params in AsyncLLM.encode() (#31013)
jeffreywang-anyscale
committed
1 day ago
Verified
1501a407
[CI] FIx `fixture 'siglip_attention_config' not found` (#31053)
LucasWilkinson
committed
1 day ago
Verified
ff2168bc
[ROCm][CI/Build] Update ROCm dockerfiles (#30991)
gshtras
committed
1 day ago
Verified
0be14952
[Bugfix] fix the alias bug of AttentionBackendEnum when register CUSTOM attention backend to vllm (#30869)
zejunchen-zejun
committed
1 day ago
Verified
d52c5096
GLM-4.7 Tool Parser and Doc Update (#30876)
zRzRzRzRzRzRzR
committed
2 days ago
Verified
8a7a4143
[MoE Refactor][2/N] Use Modular Kernels for Fp8 (#30825)
robertgshaw2-redhat
committed
2 days ago
Verified
95befecc
[Bug] Fix `error 'Dynamo failed to run FX node with fake tensors` for Deepseek V3.2 (#31046)
yewentao256
committed
2 days ago
Verified
4cf94298
[MoE Refactor][3/N] Deprecate cutlass block quant fp8 (b200) (#30990)
robertgshaw2-redhat
committed
2 days ago
Verified
83a317f6
[BugFix] Fix TypeError: unhashable type: 'dict' when serving deepseek32 (#30924)
LucasWilkinson
committed
2 days ago
Verified
5f6477d1
[Refactor] Refactor for `DeepGemmQuantScaleFMT` using cache (#30898)
yewentao256
committed
2 days ago
Verified
3bd8335b
Make engine core client handshake timeout configurable (#27444)
eicherseiji
committed
2 days ago
Verified
1ab52135
[Model] Add MiMo-V2-Flash support (#30836)
Abatom
committed
2 days ago
Verified
969bbc7c
Update Pytorch version update docs (#30982)
atalman
committed
2 days ago
Verified
268a972c
[Quantization] fix marlin w8a8 check (#30961)
jinzhen-lin
committed
2 days ago
Verified
5fbfa8d9
[CustomOp][Refactor] Extract common methods for ApplyRotaryEmb CustomOp (#31021)
shen-shanshan
committed
2 days ago
Verified
23a1946e
[Bugfix] [Kernel] Triton attention kernels: mask out V blocks that fall outside sliding window (#30887)
tdoublep
committed
2 days ago
Verified
b5545d9d
[CPU][Bugfix] Fix ppc64le CPU build (#30871)
npanpaliya
committed
2 days ago
Verified
bd2b52fc
Enable aarch64 CPU performance benchmarks (#26494)
bigPYJ1151
committed
2 days ago
Verified
420ba2db
[Frontend][Bug] allow tool calls in analysis channel (#28139)
dr75
committed
2 days ago
Verified
45594967
[Bugfix] Add validation for tool requests when tool_parser is unavailable (#30613)
majiayu000
committed
2 days ago
Verified
086b9633
[Quantization] enable compressed-tensors marlin support for turing (2) (#31008)
jinzhen-lin
committed
2 days ago
Verified
9187de9f
Older