vllm-project/vllm

Pull Requests Commits

Robert Shaw committed 147 days ago

b6381ced

[doc] Add more details for Ray-based DP (#20948)

ruisearch42 committed 147 days ago

Verified d9127818

[MISC] Add init files for python package (#20908)

Potabk committed 147 days ago

Verified 20149d84

[V1] [Hybrid] Refactor mamba state shape calculation; enable V1 via cli (#20840)

tdoublep committed 147 days ago

Verified 3534c39a

[TPU] Optimize kv cache update kernel (#20415)

tengyifei committed 147 days ago

Verified c586b556

[Docs] Improve documentation for ray cluster launcher helper script (#20602)

crypdick committed 147 days ago

Verified 33d56000

[frontend] Refactor CLI Args for a better modular integration (#20206)

kouroshHakha committed 147 days ago

Verified f148c44c

[Docs] Improve documentation for RLHF example (#20598)

crypdick committed 147 days ago

Verified 235bfd5d

[frontend] Add --help=page option for paginated help output (#20961)

reidliu41 committed 147 days ago

Verified 68d28e37

[Misc] Refactor AllReduceFusionPass. Remove parameter (#20918)

ilmarkov committed 148 days ago

Verified 37a7d5d7

Implement Async Scheduling (#19970)

WoosukKwon committed 148 days ago

Verified d4d30940

[Model] Add AutoWeightsLoader support for BERT, RoBERTa (#20534)

jennifurhe committed 148 days ago

Verified 85bd6599

[cold start] replace VLLM_COMPILE_DEPYF with debug_dump_dir (#20940)

BoyuanFeng committed 148 days ago

Verified 91b3d190

[Doc] Clearer mistral3 and pixtral model support description (#20926)

Isotr0py committed 148 days ago

Verified fc017915

[Bugfix] Switch bailout logic for kv-cache-dtype with SM100 Flashinfer (#20934)

pavanimajety committed 148 days ago

Verified 9ad0a458

Enabled BnB NF4 inference on Gaudi (#20172)

rsshaik1 committed 148 days ago

Verified 016b8d1b

[CI] Fix flaky `test_streaming_response` test (#20913)

NickLucche committed 148 days ago

Verified 80305c1b

feat: add image zoom to improve image viewing experience (#20763)

reidliu41 committed 148 days ago

Verified 37e2ecac

[Docs] Add Kuberay to deployment integrations (#20592)

crypdick committed 148 days ago

Verified 054c8657

Use w8a8 quantized matmul Pallas kernel (#19170)

vanbasten23 committed 148 days ago

Verified d4170fad

[CI/Build] Split Entrypoints Test into LLM and API Server (#20945)

mgoin committed 148 days ago

Verified 946aadb4

[Bugfix] Fix incorrect dispatch for CutlassBlockScaledGroupedGemm and DeepGEMM (#20933)

mgoin committed 148 days ago

Verified bcdfb2a3

[BugFix] VLLM_DISABLE_COMPILE_CACHE=1 should disable all reads and writes from the cache (#20942)

zou3519 committed 148 days ago

Verified ba8c3000

SM100 Cutlass MLA decode with unrestricted num_heads (< 128) for DeepSeek TP (#20769)

alexm-redhat committed 148 days ago

Verified 8cdc3712

Fall back if flashinfer comm module not found (#20936)

sarckk committed 148 days ago

Verified 61e20828

[Docs] remove outdated performance benchmark (#20935)

KuntaiDu committed 148 days ago

Verified 55e1c66d

Fix overflow indexing in causal_conv1d kernel (#20938)

tdoublep committed 148 days ago

Verified 86f3ac21

[Misc] Relax translations tests (#20856)

NickLucche committed 148 days ago

Verified 149f2435

[Misc] ModularKernel : Perform WeightAndReduce inside TritonExperts & DeepGemmExperts (#20725)

varun-sundar-rabindranath committed 148 days ago

Verified c0569dbc

Add benchmark dataset for mlperf llama tasks (#20338)

mgoin committed 148 days ago

Verified 8bb43b9c

Older