vllm-project/vllm

Pull Requests Commits

zero out attn output

varun committed 73 days ago

be441ac5

Merge remote-tracking branch 'elvir/fix-ep-weight-filter-eplb' into gb200-0317

tlrmchlsmth committed 73 days ago

3c01ddbc

[Bugfix] Rescale NVFP4 weight scales to fix BF16 dequant underflow (#34577)

ricky-chaoju committed 73 days ago

Verified 24575899

[Bugfix] Fix mock.patch resolution failure for standalone_compile.FakeTensorMode on Python <= 3.10 (#37158)

dbari committed 73 days ago

Verified 1204cf0a

[Perf] Set Flashinfer sparse MLA as default backend for FP8 kv cache (#37252)

wzhao18 committed 73 days ago

Verified b36adfa3

[Deprecation] Deprecate `--calculate-kv-scales` option (#37201)

mgoin committed 73 days ago

Verified e78821b4

[Model] Remove unused `handle_oov_mm_token` (#37321)

DarkLight1337 committed 73 days ago

Verified 51f0acda

bump compressed-tensors version to 0.14.0.1 (#36988)

brian-dellabetta committed 73 days ago

Verified fa75204b

[Bug] Fix FlashInfer MNNVL socket collisions under concurrent vLLM jobs (#36674)

yewentao256 committed 73 days ago

Verified bdb903bb

[Torch 2.11] Guard torch._C._cpu attribute checks for forward compatibility (#35673)

atalman committed 73 days ago

Verified 68f783a7

[CI] Split Distributed Tests (4 GPUs) and Kernel MoE tests (#37100)

avinashsingh77 committed 73 days ago

Verified c5030c43

[Perf] Optimize top-k search in apply_top_k_top_p_triton sampler (#37225)

mgoin committed 73 days ago

Verified 51b2333b

[Bugfix] EP weight filter: don't skip scale tensors

elvircrn committed 73 days ago

Verified f12f88f6

[Bugfix] Disable EP weight filter when EPLB is enabled (#37136)

elvircrn committed 73 days ago

Verified e9e14c76

[CI] Fix GPU memory leak when RemoteOpenAIServer fails to start in __init__ (#37230)

AndreasKaratzas committed 73 days ago

Verified 4ed51308

[Bugfix] Standardize custom HF Processor init (#37289)

DarkLight1337 committed 73 days ago

Verified c781fbba

[BugFix] PyTorch Compilation Tests should error if any test fails (#37300)

zou3519 committed 73 days ago

Verified 979ff44c

[Bugfix] Fix DP MTP Dummy Run (#35243)

benchislett committed 73 days ago

Verified f63ed7b5

[openapi] remove redundant exception stack trace[4/N] (#37157)

andyxning committed 73 days ago

Verified c9e50962

[`UltraVox`] Fix output type (#37224)

vasqu committed 74 days ago

Verified 2ff0ad96

[Chore] Replace all base64 usages with faster pybase64 package (#37290)

Isotr0py committed 74 days ago

Verified a836524d

[Misc][LoRA] Add --lora-target-modules to restrict LoRA to specific modules (#34984)

bhoomit committed 74 days ago

Verified 3717a4dd

Fix Phi3 test that fails with Transformers v5 (#37298)

hmellor committed 74 days ago

Verified ecfcdd2c

[Bugfix] Fix unclean shutdown crash with AllReduce Fusion workspace (#36955)

siewcapital committed 74 days ago

Verified c25dbc2d

pick up tuned prefill configs for FP8 FA3 (#36265)

jmkuebler committed 74 days ago

Verified 77d2a5f1

[Frontend] Complete OpenAI render delegation (#37287)

sagearc committed 74 days ago

Verified 59192dfd

[Misc] Use VLLMValidationError in batch, pooling, and tokenize protocol validators (#36256)

umut-polat committed 74 days ago

Verified 56cb1baa

[1/2] Move InternVL-based processors (#37260)

DarkLight1337 committed 74 days ago

Verified f3403243

Bugfix for offloading+prefetch for GLM-4.7-FP8 (#37178)

sfbemerk committed 74 days ago

Verified 2660b928

Add gigachat 3.1 tool parser + fix gigachat3 tool parser (#36664)

ajpqs committed 74 days ago

Verified 293f036e

Older