vllm-project/vllm

Pull Requests Commits

revert spurious change

Robert Shaw committed 74 days ago

275da3ca

Robert Shaw committed 74 days ago

a250ae33

update from nixl to internal

Robert Shaw committed 74 days ago

4b554d19

humans are still needed to write code

Robert Shaw committed 74 days ago

934224a2

[Feat][NIXL] Add KV lease refresh mechanism for disaggregated prefill

Robert Shaw committed 74 days ago

38c00afb

[Docs] Add breadcrumbs for better UX (#35749)

hmellor committed 74 days ago

Verified 7e9149d9

[MyPy][BugFix] Check profiler is assigned before calling start() on it (#35505)

hickeyma committed 74 days ago

Verified 87c98b02

Fix unresolved-import errors when using Astral's ty by removing src.root (#35681)

tlrmchlsmth committed 74 days ago

Verified de7dd634

[Feat] Supports Anthropic Messages count_tokens API (#35588)

chaunceyjiang committed 74 days ago

Verified 9a87b057

[Misc] Cleanup useless `current_platform` import (#35715)

wangxiyuan committed 74 days ago

Verified 510bc9e1

[CPU][Distributed] Fix Enable _CPUSHMDistributed only when TP/PP ranks share the same SHM group name (#34169)

charlesashby committed 74 days ago

Verified cbd361fd

[Misc] Bound NIXL upper bound version (#35495)

NickLucche committed 74 days ago

Verified c212202d

[CI] Defining extended V1 e2e + engine tests (#35580)

AndreasKaratzas committed 74 days ago

Verified ec27b36b

[Rocm][CI] Fix LM Eval Large Models (H100) test group (#34750)

charlifu committed 74 days ago

Verified 3fd1d4ec

[Kernel] Integrate SM100 MXFP8 blockscaled grouped MM and quant kernels (#34448)

EdalatiAli committed 74 days ago

Verified cb21972a

[ROCm][CI] Disable skinny GEMMs in language model standard tests to fix non-determinism (#35152)

AndreasKaratzas committed 74 days ago

Verified c34963f1

[ROCm] add amd-quark package in requirements for rocm to use quantized models (#35658)

hongxiayang committed 75 days ago

Verified f26650d6

[XPU] fix mxfp4 activation type (#35691)

jikunshang committed 75 days ago

Verified 92f5d0f0

Fix deprecated v1 config tests (#35327)

jcaip committed 75 days ago

Verified a60985b0

[Attention] FA4 integration (#32974)

LucasWilkinson committed 75 days ago

Verified 8b5014d3

Revert "[Bugfix] Disable TRTLLM attention with KV transfer enabled (#33192)" (#34832)

ZhanqiuHu committed 75 days ago

Verified 57a96e26

[torch.compile] Undo the fast_moe_cold_start hack in torch>=2.11 (#35475)

zou3519 committed 75 days ago

Verified e82fbeec

[Bugfix] Fix dtype mismatch in RMSNormGated.forward_native() during torch.compile (#35256)

haosdent committed 75 days ago

Verified 62904708

[Model Runner V2] Use block table apis for capture inputs (#35671)

WoosukKwon committed 75 days ago

Verified 72f4d162

fix(mxfp4): return is_monolithic=False when LoRA is enabled for Triton backend (#35382)

yoonsnowdev committed 75 days ago

Verified 5a435507

[MISC] Fixing a null reference by removing parallel_utils from mypy EXCLUDE (#35630)

taneem-ibrahim committed 75 days ago

Verified 59d7af9c

[Mamba1] - Kernel Level Chunk Alignment for Prefix Caching (#34798)

Josephasafg committed 75 days ago

Verified bbf81f9a

[Model Runner V2] Minor refactoring for EncoderRunner (#35628)

WoosukKwon committed 75 days ago

Verified da543d1a

[AMD][CI] Support Triton attention with ExampleConnector (#34931)

rjrock committed 75 days ago

Verified 87d319c5

Fix typo: implictly -> implicitly in isaac.py docstring (#35646)

lin-shh committed 75 days ago

Verified a9ec392c

Older