vllm-project/vllm

Pull Requests Commits

[Model] Add ToolParser and MoE Config for Hunyuan A13B (#20820)

Asher committed 158 days ago

Verified 5a7fb3ab

[Kernel] DeepGemm MoE : Integrate triton permute / unpermute kernels (#20903)

varun-sundar-rabindranath committed 158 days ago

Verified 11dfdf21

[Bugfix]: Fix final_res_batch list index out of range error (#21055)

chaunceyjiang committed 158 days ago

Verified fdc5b43d

[Misc] Fix PhiMoE expert mapping (#21085)

jeejeelee committed 158 days ago

Verified c5b8b595

[V1] [KVConnector] Fix MultiprocExecutor worker output aggregation (#21048)

sdavidbd committed 158 days ago

Verified 4fcef49e

[V1][P/D]Enhance Performance and code readability for P2pNcclConnector (#20906)

Abatom committed 158 days ago

Verified 8a4e5c5f

[Attention] Refactor attention metadata builder interface (#20466)

LucasWilkinson committed 158 days ago

Verified 76b49444

[Bugfix] Fix Machete zero point issue for GPTQ models on SM90 (#21066)

mgoin committed 158 days ago

Verified 28a6d542

[TPU] Start using python 3.12 (#21000)

vanbasten23 committed 158 days ago

Verified 58760e12

[Docker] Allow FlashInfer to be built in the ARM CUDA Dockerfile (#21013)

mgoin committed 158 days ago

Verified a50d9182

[Bugfix] weight loading use correct tp_group with patch_tensor_parallel_group (#21024)

Kevin-XiongC committed 158 days ago

Verified c9ba8104

Update PyTorch to `torch==2.7.1` for CUDA (#21011)

mgoin committed 158 days ago

Verified 4e7dfbe7

Remove torch_xla.tpu.version() from pallas.py. (#21065)

QiliangCui committed 158 days ago

Verified 72ad2735

Support FP8 Quantization and Inference Run on Intel Gaudi (HPU) using INC (Intel Neural Compressor) (#12010)

nirda7 committed 158 days ago

Verified 01513a33

[Model] Remove model sampler (#21059)

DarkLight1337 committed 158 days ago

Verified ac2bf41e

Remove Qwen Omni workaround that's no longer necessary (#21057)

hmellor committed 159 days ago

Verified a931b4cd

[fix] fix qwen image_embeds input (#21049)

h-avsha committed 159 days ago

Verified a0f8a796

feat - add a new endpoint `get_tokenizer_info` to provide tokenizer/chat-template information (#20575)

m-misiura committed 159 days ago

Verified 18bdcf41

[Model] Consolidate pooler implementations (#20927)

DarkLight1337 committed 159 days ago

Verified 1c3198b6

[Docs] Add intro and fix 1-2-3 list in frameworks/open-webui.md (#19199)

windsonsea committed 159 days ago

Verified 260127ea

Fix inadvertently silenced PP tests for `mp`, add DeepSeek V2/V3 model family to PP tests (#20831)

eicherseiji committed 159 days ago

Verified d0dc4cfc

[BugFix] Fix import error on non-blackwell machines (#21020)

LucasWilkinson committed 159 days ago

Verified d31a6471

[TPU] fix kv_cache_update kernel block size choosing logic (#21007)

yaochengji committed 159 days ago

Verified 85431bd9

[Meta] Llama4 EAGLE Support (#20591)

morgendave committed 159 days ago

Verified c11013db

[CI] update typos config for CI pre-commit and fix some spells (#20919)

panpan0000 committed 159 days ago

Verified 1eb2b9c1

Avoid direct comparison of floating point numbers (#21002)

maxdebayser committed 159 days ago

Verified 6ebf3137

[Voxtral] Add more tests (#21010)

patrickvonplaten committed 159 days ago

Verified cfbcb9ed

[Doc] Remove duplicate docstring (#21012)

yewentao256 committed 159 days ago

Verified 76ddeff2

[Bugfix] Fix Mistral3 support on SM100/SM120 (#20998)

mgoin committed 159 days ago

Verified f4609833

[CI][HPU] update for v0 deprecate by switching to VLLM_TARGET_DEVICE=empty (#21006)

xuechendi committed 159 days ago

Verified e9534c72

Newer Older