vllm-project/vllm

Pull Requests Commits

Robert Shaw committed 23 days ago

177d973b

[XPU]Support AgRsAll2AllManager on XPU device (#32654)

ys950902 committed 23 days ago

Verified 13f6630a

[4/N] Initialize MM components in context managers (M-P) (#32663)

DarkLight1337 committed 23 days ago

Verified fda3f03e

[Metrics] Complete removal of deprecated vllm:time_per_output_token_seconds metric (#32661)

carlory committed 23 days ago

Verified bb917203

[Bugfix] Fix the fp8_mqa_logits dim mismatch (#32652)

chaunceyjiang committed 23 days ago

Verified c4e5bdf6

[3/N] Initialize MM components in context managers (I-L) (#32650)

DarkLight1337 committed 23 days ago

Verified 7f1bcd18

[Core] Cleanup shm based object store on engine shutdown (#32429)

walterbm committed 23 days ago

Verified 8be263c3

[2/N] Initialize MM components in context managers (E-H) (#32641)

DarkLight1337 committed 23 days ago

Verified e1a34c3a

[Refactor] Make FP8 Linear Ops use kernel abstraction (#27814)

vllmellm committed 23 days ago

Verified 148117ea

[Model Runner V2] Skip kernel launch for penalties & logit_bias (#32634)

WoosukKwon committed 23 days ago

Verified e9c83cdc

[1/N] Initialize MM components in context managers (A-D) (#32632)

DarkLight1337 committed 23 days ago

Verified b75e85de

[Model] Use context managers for encoder- and LM-only mode (#32605)

DarkLight1337 committed 23 days ago

Verified 4753f3bf

[Model Runner V2] Decouple temperature from penalties (#32629)

WoosukKwon committed 23 days ago

Verified 6c01ffb8

[Model Runner V2] Refactor get_cudagraph_and_dp_padding (#32625)

WoosukKwon committed 23 days ago

Verified 7b7cdce9

[Feat] allow inplace loading lora (#31326)

Jackmin801 committed 23 days ago

Verified 12dab78f

[Model Runner V2] Initialized communication buffer for DP (#32624)

WoosukKwon committed 23 days ago

Verified 05dc4bfa

[Attention][MLA] Make FLASHINFER_MLA the default MLA backend on Blackwell, and TRTLLM the default prefill (#32615)

MatthewBonanni committed 23 days ago

Verified 1a1fc3bb

[Model Runner V2] Refactor `dummy_run` (#32533)

WoosukKwon committed 23 days ago

Verified 43fada53

feat: spec decode with draft models (#24322)

tomasruizt committed 23 days ago

Verified 4a5299c9

docs: prefix caching seems quite outdated (#28784)

longregen committed 23 days ago

Verified 73f2a81c

[BugFix] Fix TRT-LLM NVFP4 DP/EP (#32349)

jiahanc committed 23 days ago

Verified 73503317

[CI] Add Helion as an optional dependency (#32482)

gmagogsfm committed 24 days ago

Verified 9d1e611f

[BUGFIX] Fix `test_mla_backends.py`. Scale MLA projection weights to prevent numerical instability (#32529)

vadiklyutiy committed 24 days ago

Verified 0727cc9e

[CI][amd] Revert NIXL connector change to avoid crash (#32570)

qli88 committed 24 days ago

Verified a0490be8

support dynamic resolution image encoding for Nemotron Nano VL (#32121)

netanel-haber committed 24 days ago

Verified cd3ac5b7

[Misc] Remove unused ModelKeys (#32608)

jeejeelee committed 24 days ago

Verified 2636d762

Add support for LoRA adapters in Nemotron-H models (#30802)

danisereb committed 24 days ago

Verified aa7f37cc

[Frontend] Score entrypoint support data_1 & data_2 and queries & documents as inputs (#32577)

noooop committed 24 days ago

Verified c88860d7

[NIXL][Metrics] Track `nixl_num_kv_expired_reqs` metric in Prometheus (#32340)

NickLucche committed 24 days ago

Verified 758df5af

[CI/Build] Fix dependency conflict between model-hosting-container-standards and starlette (#32560)

DanielMe committed 24 days ago

Verified cdd03d25

Older