vllm-project/vllm

Pull Requests Commits

Remove executable flag on a few files

tlrmchlsmth committed 363 days ago

f8768f52

[Kernels] MoE refactor (#19636)

bnellnm committed 363 days ago

Verified c1909e7e

Documentation update tool_calling: mapping back to function from response (#20373)

cronoik-inceptionai committed 363 days ago

Verified b9587750

[Model] Adds support for SlimMoE models Phi-tiny-MoE-instruct (#20286)

zichongli5 committed 363 days ago

Verified 706ff132

[Bugfix] Fix the max_seq_len limit of 16384 for DeepSeek models (#20322)

huaqiangwang committed 363 days ago

Verified ccbfb1d1

[NVIDIA] Support Cutlass w8a8 FP8 for Blackwell Geforce GPUs (sm120) (#17280)

kaln27 committed 363 days ago

Verified 9e5552aa

[Build/CI] Automatically tag DeepSeek related PRs (#20370)

houseroad committed 363 days ago

Verified 0c600b9a

[Model] Add Ernie4.5 and Ernie4.5MoE Model Support (#20220)

CSWYF3634076 committed 363 days ago

Verified e303dcf5

[Docs] Make TPU ref prettier in google_tpu.md (#20356)

windsonsea committed 363 days ago

Verified ae9c4d41

[Docs] Fix indentations for 2-level items in deprecation_policy.md (#20352)

windsonsea committed 363 days ago

Verified d853520b

[Bugfix] Keye-VL compatibility with `tok_kwargs` (#20058) (#20353)

DarkLight1337 committed 363 days ago

Verified ba51aea6

[Model][VLM] Support Keye-VL-8B-Preview (#20126)

Kwai-Keye committed 363 days ago

Verified 8452946c

[Frontend] Support configurable mm placeholder strings & flexible video sampling policies via CLI flags. (#20105)

huachenheli committed 363 days ago

Verified 2e7cbf2d

[TPU] kv cache update kernel supports dynamic grid (#20235)

yaochengji committed 363 days ago

Verified 7da296be

[Doc][TPU] Add models and features supporting matrix. (#20230)

QiliangCui committed 363 days ago

Verified b205e846

fix[Docs]: link anchor is incorrect #20309 (#20315)

yyzxw committed 363 days ago

Verified be0cfb2b

[Bugfix] Fix dynamic rotary embedding (#20343)

DarkLight1337 committed 363 days ago

Verified 1a03dd49

[FIX][Intel GPU]fix ipex flash_attn_varlen_func api missing parameter (#20348)

jikunshang committed 363 days ago

Verified 27b80176

[Misc][Doc] Add missing comment for LLM (#20285)

draftbk committed 363 days ago

Verified 9ec1e306

[Refactor] Remove Unused Env `VLLM_ENABLE_MOE_ALIGN_BLOCK_SIZE_TRITON` (#20334)

yewentao256 committed 363 days ago

Verified 9dae7d46

[Refactor] Remove duplicate `find_free_port` (#20333)

yewentao256 committed 363 days ago

Verified 7058d7dd

[UT][intel GPU] use current_platform instead of device hardcode in v1 tests (#20169)

Liangliang-Ma committed 363 days ago

Verified a0389e05

[Kernel][Bugfix] Fixup some warnings in nvfp4_blockwise_moe when CUDA < 12.8 (#20324)

tlrmchlsmth committed 363 days ago

Verified 3be8d312

Enable group size 64 for Machete (#20290)

czhu-cohere committed 363 days ago

Verified 3abfe221

[Refactor] Refactor import utils (#20269)

yewentao256 committed 363 days ago

Verified e81fbefe

remove unused variables in marlin_template.h (#20236)

zhoutianzi666 committed 364 days ago

Verified 9290de56

[Optimization] Cache sampled token ids in model runner (#20291)

WoosukKwon committed 364 days ago

Verified 7f280d69

[V1] [ROCm] Enable EP with AITER Fused MoE (#20270)

tjtanaa committed 364 days ago

Verified 02cabff2

[Frontend] Expand tools even if tool_choice="none" (#17177)

okdshin committed 364 days ago

Verified 3d19d47d

[CUDA graphs] Enable full cuda graphs with FA3 AoT scheduling (#20301)

WoosukKwon committed 364 days ago

Verified 8acb4bad

Older