vllm-project/vllm

Pull Requests Commits

Roger Wang committed 1 year ago

f96a3cc7

Roger Wang committed 1 year ago

32c01557

Update CT WNA16MarlinMoE integration (#16666)

mgoin committed 1 year ago

Verified 22481fbf

[Misc] Auto fallback to float16 for pre-Ampere GPUs when detected bfloat16 config (#17265)

Isotr0py committed 1 year ago

Verified 5c4c08f6

[Misc] Add references in ray_serve_deepseek example (#17907)

ruisearch42 committed 1 year ago

Verified c44c384b

Revert "[BugFix][AMD] Compatible patch for latest AITER(05/07/2025)" (#17910)

mgoin committed 1 year ago

Verified 85b72cb7

[CI/Build] Automatically retry flaky tests (#17856)

DarkLight1337 committed 1 year ago

Verified 6e5595ca

[v1] Move block management logic from KVCacheManager to SpecializedManager (#17474)

heheda12345 committed 1 year ago

Verified 200da9a5

[BugFix][AMD] Compatible patch for latest AITER(05/07/2025) (#17864)

qli88 committed 1 year ago

Verified 9f64e934

[Misc] add dify integration (#17895)

reidliu41 committed 1 year ago

Verified ec61ea20

Change `top_k` to be disabled with `0` (still accept `-1` for now) (#17773)

hmellor committed 1 year ago

Verified c6798baa

Fix Whisper crash caused by invalid``` max_num_batched_tokens``` config (#17853)

inkcherry committed 1 year ago

Verified 5b2dcbf0

[Bugfix][CPU] Fix broken AVX2 CPU TP support (#17252)

Isotr0py committed 1 year ago

Verified 6e4a93e3

[Bugfix][ROCm] Fix AITER MLA V1 (#17880)

vllmellm committed 1 year ago

Verified 217db4ba

[Doc] remove visible token in doc (#17884)

yma11 committed 1 year ago

Verified ff8c4005

[Doc] Update several links in reasoning_outputs.md (#17846)

windsonsea committed 1 year ago

Verified 89a0315f

[Docs] Add Slides from NYC Meetup (#17879)

simon-mo committed 1 year ago

Verified 3d1e3876

[BUGFIX]: return fast when request requires prompt logprobs (#17251)

andyxning committed 1 year ago

Verified d310e6de

[Attention] MLA move rotary embedding to cuda-graph region (#17668)

LucasWilkinson committed 1 year ago

Verified 5e6f9394

[V1][Structured Output] Update llguidance (`>= 0.7.11`) to avoid AttributeError (no `StructTag`) (#17839)

shen-shanshan committed 1 year ago

Verified 760e3ecc

[FEAT][ROCm]: Support AITER MLA on V1 Engine (#17523)

vllmellm committed 1 year ago

Verified 3c9396a6

Add cutlass support for blackwell fp8 blockwise gemm (#14383)

wenscarl committed 1 year ago

Verified 376786fa

Fix noisy warning for uncalibrated q_scale/p_scale (#17414)

mgoin committed 1 year ago

Verified 4f605a6d

[CI] Prune down lm-eval small tests (#17012)

mgoin committed 1 year ago

Verified 8342e3ab

[Test] Attempt all TPU V1 tests, even if some of them fail. (#17334)

yarongmu-google committed 1 year ago

Verified a83a0f92

[V1] Improve VLLM_ALLOW_INSECURE_SERIALIZATION logging (#17860)

russellb committed 1 year ago

Verified 226a4272

[CI] Fix test_collective_rpc (#17858)

russellb committed 1 year ago

Verified ec54d73c

[Misc] Delete LoRA-related redundancy code (#17841)

jeejeelee committed 1 year ago

Verified a944f8ed

[Bugfix] `use_fast` failing to be propagated to Qwen2-VL image processor (#17838)

DarkLight1337 committed 1 year ago

Verified 015815fe

Fix transient dependency error in docs build (#17848)

hmellor committed 1 year ago

Verified e4ca6e3a

Older