vllm-project/vllm

Pull Requests Commits

Reduce Scatter Plumbing

tlrmchlsmth committed 1 year ago

3679753a

[Bugfix] Initialize attention bias on the same device as Query/Key/Value for QwenVL Series (#14031)

LouieYang committed 1 year ago

Verified 9b61dd41

[VLM][Bugfix] Enable specifying prompt target via index (#14038)

DarkLight1337 committed 1 year ago

Verified f7bee5c8

[Bugfix] Fix MoeWNA16Method activation (#14024)

jeejeelee committed 1 year ago

Verified e0734387

Update AutoAWQ docs (#14042)

hmellor committed 1 year ago

Verified f58f8b5c

[V1][Minor] Restore V1 compatibility with LLMEngine class (#13090)

Ryp committed 1 year ago

Verified b3f7aacc

[Hardware][Intel-Gaudi] Regional compilation support (#13213)

Kacper-Pietkun committed 1 year ago

Verified b91660dd

Use smaller embedding model when not testing model specifically (#13891)

hmellor committed 1 year ago

Verified 76c89fca

[Bugfix][Disaggregated] patch the inflight batching on the decode node in SimpleConnector to avoid hangs in SimpleBuffer (nccl based) (#13987)

hasB4K committed 1 year ago

Verified b9e41734

[Doc] Move multimodal Embedding API example to Online Serving page (#14017)

DarkLight1337 committed 1 year ago

Verified 1088f062

[Bugfix] Check that number of images matches number of <|image|> tokens with mllama (#13911)

tjohnson31415 committed 1 year ago

Verified 73e0225e

[V1]`SupportsV0Only` protocol for model definitions (#13959)

ywang96 committed 1 year ago

Verified 6c85da3a

[Misc] Print FusedMoE detail info (#13974)

jeejeelee committed 1 year ago

Verified 67fc4268

[Model][Speculative Decoding] Expand DeepSeek MTP code to support k > n_predict (#13626)

benchislett committed 1 year ago

Verified 9804145c

[Attention] Flash MLA for V1 (#13867)

LucasWilkinson committed 1 year ago

Verified 2e94b9cf

[core] Perf improvement for DSv3 on AMD GPUs (#13718)

qli88 committed 1 year ago

Verified 8294773e

[V1][Minor] Minor cleanup for GPU Model Runner (#13983)

WoosukKwon committed 1 year ago

Verified cd813c6d

[ROCm] Fix the Kernels, Core, and Prefix Caching AMD CI groups (#13970)

SageMoore committed 1 year ago

Verified 38acae6e

[VLM] Deprecate legacy input mapper for OOT multimodal models (#13979)

DarkLight1337 committed 1 year ago

Verified a2dd48c3

Bump azure/setup-helm from 4.2.0 to 4.3.0 (#13742)

dependabot[bot] committed 1 year ago

Verified 126f6bee

[Attention] MLA support for V1 (#13789)

Yang Chen committed 1 year ago

Verified 58d1b2aa

[VLM] Generalized prompt updates for multi-modal processor (#13964)

DarkLight1337 committed 1 year ago

Verified f1579b22

[Bugfix] Fix qwen2.5-vl overflow issue (#13968)

Isotr0py committed 1 year ago

Verified 78648758

Update LMFE version to v0.10.11 to support new versions of transforme… (#13930)

noamgat committed 1 year ago

Verified 1dd422b6

[bugfix] Fix profiling for RayDistributedExecutor (#13945)

ruisearch42 committed 1 year ago

Verified 06c8f8d8

Deduplicate `.pre-commit-config.yaml`'s `exclude` (#13967)

hmellor committed 1 year ago

Verified 5677c9bb

Update quickstart.md (#13958)

observerw committed 1 year ago

Verified 512d77d5

[Model] Deepseek GGUF support (#13167)

SzymonOzog committed 1 year ago

Verified 7f0be2aa

[VLM] Support multimodal inputs for Florence-2 models (#13320)

Isotr0py committed 1 year ago

Verified edf309eb

Fix test_block_fp8.py test for MoE (#13915)

mgoin committed 1 year ago

Verified 788f284b

Older