vllm-project/vllm

Pull Requests Commits

[MoE Refactor][3/N] Use Modular Kernels for ModelOpt FP8

Robert Shaw committed 7 hours ago

a6fa5113

ci: add nvidia-smi warmup before Prime-RL integration test (#31093)

AmeenP committed 8 hours ago

Verified 93cabc41

add aarnphm and chaunceyjiang to the new tool_parser directory (#31088)

chaunceyjiang committed 21 hours ago

Verified bb80f69b

[BugFix]fix gpt-oss v1/completions response bug (#30608)

princepride committed 21 hours ago

Verified 3e92b2b7

[Quantization] add marlin w4a8/w8a8 check (#31061)

jinzhen-lin committed 1 day ago

Verified 7c73ceb5

[CI] Fix H200 Distributed test (#31054)

LucasWilkinson committed 1 day ago

Verified ae0770fa

[Quantization] support logical_widths for fp8 marlin (#30962)

jinzhen-lin committed 1 day ago

Verified ee52d990

[MoE Refactor][5/N] Isolate zero expert to LongCatFlash (#28891)

baonudesifeizhai committed 1 day ago

Verified 54c89243

[XPU] enable fp8 online streaming quantization (#30944)

yma11 committed 1 day ago

Verified 560ae963

[Bugfix] Read truncate_prompt_tokens from pooling_params in AsyncLLM.encode() (#31013)

jeffreywang-anyscale committed 1 day ago

Verified 1501a407

[CI] FIx `fixture 'siglip_attention_config' not found` (#31053)

LucasWilkinson committed 1 day ago

Verified ff2168bc

[ROCm][CI/Build] Update ROCm dockerfiles (#30991)

gshtras committed 1 day ago

Verified 0be14952

[Bugfix] fix the alias bug of AttentionBackendEnum when register CUSTOM attention backend to vllm (#30869)

zejunchen-zejun committed 1 day ago

Verified d52c5096

GLM-4.7 Tool Parser and Doc Update (#30876)

zRzRzRzRzRzRzR committed 2 days ago

Verified 8a7a4143

[MoE Refactor][2/N] Use Modular Kernels for Fp8 (#30825)

robertgshaw2-redhat committed 2 days ago

Verified 95befecc

[Bug] Fix `error 'Dynamo failed to run FX node with fake tensors` for Deepseek V3.2 (#31046)

yewentao256 committed 2 days ago

Verified 4cf94298

[MoE Refactor][3/N] Deprecate cutlass block quant fp8 (b200) (#30990)

robertgshaw2-redhat committed 2 days ago

Verified 83a317f6

[BugFix] Fix TypeError: unhashable type: 'dict' when serving deepseek32 (#30924)

LucasWilkinson committed 2 days ago

Verified 5f6477d1

[Refactor] Refactor for `DeepGemmQuantScaleFMT` using cache (#30898)

yewentao256 committed 2 days ago

Verified 3bd8335b

Make engine core client handshake timeout configurable (#27444)

eicherseiji committed 2 days ago

Verified 1ab52135

[Model] Add MiMo-V2-Flash support (#30836)

Abatom committed 2 days ago

Verified 969bbc7c

Update Pytorch version update docs (#30982)

atalman committed 2 days ago

Verified 268a972c

[Quantization] fix marlin w8a8 check (#30961)

jinzhen-lin committed 2 days ago

Verified 5fbfa8d9

[CustomOp][Refactor] Extract common methods for ApplyRotaryEmb CustomOp (#31021)

shen-shanshan committed 2 days ago

Verified 23a1946e

[Bugfix] [Kernel] Triton attention kernels: mask out V blocks that fall outside sliding window (#30887)

tdoublep committed 2 days ago

Verified b5545d9d

[CPU][Bugfix] Fix ppc64le CPU build (#30871)

npanpaliya committed 2 days ago

Verified bd2b52fc

Enable aarch64 CPU performance benchmarks (#26494)

bigPYJ1151 committed 2 days ago

Verified 420ba2db

[Frontend][Bug] allow tool calls in analysis channel (#28139)

dr75 committed 2 days ago

Verified 45594967

[Bugfix] Add validation for tool requests when tool_parser is unavailable (#30613)

majiayu000 committed 2 days ago

Verified 086b9633

[Quantization] enable compressed-tensors marlin support for turing (2) (#31008)

jinzhen-lin committed 2 days ago

Verified 9187de9f

Older