vllm-project/vllm

Pull Requests Commits

Change default request logging behavior

simon-mo committed 347 days ago

5bb81e28

Support FP8 Quantization and Inference Run on Intel Gaudi (HPU) using INC (Intel Neural Compressor) (#12010)

nirda7 committed 348 days ago

Verified 01513a33

[Model] Remove model sampler (#21059)

DarkLight1337 committed 348 days ago

Verified ac2bf41e

Remove Qwen Omni workaround that's no longer necessary (#21057)

hmellor committed 348 days ago

Verified a931b4cd

[fix] fix qwen image_embeds input (#21049)

h-avsha committed 348 days ago

Verified a0f8a796

feat - add a new endpoint `get_tokenizer_info` to provide tokenizer/chat-template information (#20575)

m-misiura committed 348 days ago

Verified 18bdcf41

[Model] Consolidate pooler implementations (#20927)

DarkLight1337 committed 348 days ago

Verified 1c3198b6

[Docs] Add intro and fix 1-2-3 list in frameworks/open-webui.md (#19199)

windsonsea committed 348 days ago

Verified 260127ea

Fix inadvertently silenced PP tests for `mp`, add DeepSeek V2/V3 model family to PP tests (#20831)

eicherseiji committed 348 days ago

Verified d0dc4cfc

[BugFix] Fix import error on non-blackwell machines (#21020)

LucasWilkinson committed 349 days ago

Verified d31a6471

[TPU] fix kv_cache_update kernel block size choosing logic (#21007)

yaochengji committed 349 days ago

Verified 85431bd9

[Meta] Llama4 EAGLE Support (#20591)

morgendave committed 349 days ago

Verified c11013db

[CI] update typos config for CI pre-commit and fix some spells (#20919)

panpan0000 committed 349 days ago

Verified 1eb2b9c1

Avoid direct comparison of floating point numbers (#21002)

maxdebayser committed 349 days ago

Verified 6ebf3137

[Voxtral] Add more tests (#21010)

patrickvonplaten committed 349 days ago

Verified cfbcb9ed

[Doc] Remove duplicate docstring (#21012)

yewentao256 committed 349 days ago

Verified 76ddeff2

[Bugfix] Fix Mistral3 support on SM100/SM120 (#20998)

mgoin committed 349 days ago

Verified f4609833

[CI][HPU] update for v0 deprecate by switching to VLLM_TARGET_DEVICE=empty (#21006)

xuechendi committed 349 days ago

Verified e9534c72

Add Dockerfile argument for VLLM_USE_PRECOMPILED environment (#20943)

dougbtv committed 349 days ago

Verified 79764460

[Bugfix] Correct per_act_token in CompressedTensorsW8A8Fp8MoECutlassM… (#20937)

minosfuture committed 349 days ago

Verified fcb9f879

[Docs] Enhance Anyscale documentation, add quickstart links for vLLM (#21018)

crypdick committed 349 days ago

Verified 3ed94f9d

[Misc] Refactor: Improve argument handling for `conda` command (#20481)

reidliu41 committed 349 days ago

Verified fa839565

[Chore] Remove outdated transformers check (#20989)

b8zhong committed 349 days ago

Verified 75a99b98

[Misc] bump xgrammar version to v0.1.21 (#20992)

chaunceyjiang committed 349 days ago

Verified b5c3b683

[Model] Add ModelConfig class for GraniteMoeHybrid to override default max_seq_len_to_capture (#20923)

tdoublep committed 349 days ago

Verified 6cbc4d4b

[Frontend] Remove print left in FrontendArgs.add_cli_args (#21004)

mgoin committed 349 days ago

Verified 153c6f1e

[Frontend] OpenAI Responses API supports input image (#20975)

chaunceyjiang committed 349 days ago

Verified 34cda778

[Nvidia] Integrate SM100 cudnn prefill API to MLA prefill (#20411)

elfiegg committed 349 days ago

Verified 30800b01

[Bug Fix] get_distributed_init_method should get the ip from get_ip i… (#20889)

Relics committed 349 days ago

Verified 10be2094

[Frontend] Support cache_salt in /v1/completions and /v1/responses (#20981)

dr75 committed 349 days ago

Verified 19c86306

Older