vllm-project/vllm

Pull Requests Commits

Refactor ModelOptFp8MoEMethod to use modular kernels

Robert Shaw committed 90 days ago

5284a65b

progress towards single interface

Robert Shaw committed 92 days ago

fc6fa84b

progress towards single interface

Robert Shaw committed 92 days ago

453213f6

updated to use workspaces

Robert Shaw committed 93 days ago

919d679e

Robert Shaw committed 93 days ago

2b175179

[ROCm] Serving Fails on Radeon Due to AITER Dtype Import (#30952)

vllmellm committed 93 days ago

Verified 96bf50a2

[Bugfix][CPU] Fix Mac CPU build (#30955)

bigPYJ1151 committed 93 days ago

Verified f90d3636

[moe] Use enable_chunking func (to support disabling chunking) (#29935)

minosfuture committed 93 days ago

Verified 8372be28

[ROCm][Bugfix] Fix `fa_version` argument error in `flash_attn_maxseqlen_wrapper` for ROCm without aiter (#30909)

AndreasKaratzas committed 93 days ago

Verified 8da6ae49

[BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG support) (#30910)

LucasWilkinson committed 93 days ago

Verified 30bb19a7

[Bugfix] Fix Unicode issues in GLM-4 tool calling (#30920)

chaunceyjiang committed 93 days ago

Verified aa7e8360

[ROCm][Bugfix] fix(structured_output): Skip guidance backend for schemas with patternProperties (#30730)

AndreasKaratzas committed 93 days ago

Verified be2ad5f9

[Platform] Let EPD work with non-cuda platform (#30225)

wangxiyuan committed 93 days ago

Verified a85724bd

[Fix][FlexAttention] return max logical block index to handle reused blocks (#30915)

ivanium committed 93 days ago

Verified 11a89cf9

[CPU] Refactor CPU fused MOE (#30531)

bigPYJ1151 committed 93 days ago

Verified e3ab93c8

fix: add warmup for audio preprocessing (#30706)

TheCodeWrangler committed 93 days ago

Verified fc2ae6d6

[KV connector][LMCache] Only record the cuda event when there are request to store/load (#30814)

ApostaC committed 93 days ago

Verified ec965569

[AMD][CI] fix lm eval ci arg (#30911)

divakar-amd committed 93 days ago

Verified 82dc338a

[PERF] Qwen3-next. Add fp8 cutlass MoE tuned configs. `chmod -x *MI308X.json` (#29553)

vadiklyutiy committed 93 days ago

Verified 717ac33d

[Doc][CPU] Update CPU doc (#30765)

bigPYJ1151 committed 93 days ago

Verified cfb7e555

[refactor] Add prefix support to embed_tokens in DeepSeek MTP (#30788)

zzhx1 committed 93 days ago

Verified b166ef20

[compile] Fix CI for test_gpt2_cache_hit (#30902)

zhxchen17 committed 93 days ago

Verified 5f2f3fba

[UX] Reduce DeepGEMM warmup log output to single progress bar (#30903)

MatthewBonanni committed 93 days ago

Verified 4a8412f7

[Quantization] Support Quark int4-fp8 w4a8 for MoE (#30071)

BowenBao committed 93 days ago

Verified 0c738b58

fused_moe_lora PDL improvements (#30716)

gnovack committed 93 days ago

Verified 5a3adf58

[Chore] Remove v0 dead code for Qwen2.5-omni (#30883)

Isotr0py committed 93 days ago

Verified 6fe58876

[NIXL] Support P tensor-parallel-size > D tensor-parallel-size (#27274)

NickLucche committed 93 days ago

Verified bc3700e0

[ROCm][CI] Reduce Flakiness For test_async_scheduling Using ROCM_ATTN With FP32 (#30811)

micah-wil committed 93 days ago

Verified fd8afdf3

[Metrics] Model FLOPs Utilization estimation (#30738)

SungMinCho committed 93 days ago

Verified a0b782f9

[CI][Feature] Adds auto-rebase PR rule (#30875)

rafvasq committed 93 days ago

Verified ed2897f3

Older