vllm-project/vllm

Pull Requests Commits

tlrmchlsmth committed 236 days ago

2f86f710

Add VLLM_DISTRIBUTED_INIT_TIMEOUT_SECONDS

tlrmchlsmth committed 236 days ago

feeb1730

[Kernel] Apply torch.Tag.needs_fixed_stride_order only for torch==2.6.0 (#19346)

zou3519 committed 236 days ago

Verified b2eb2b5a

[CI] Update CODEOWNERS for vllm/compilation (#21185)

zou3519 committed 236 days ago

Verified 21274ab4

Let GraniteMoeAttention use YaRN (#21174)

tdoublep committed 236 days ago

Verified ed8cbfed

[Core] Set pooling params based on task and model (#21128)

DarkLight1337 committed 236 days ago

Verified 45badd05

[Bugfix] Allocate less memory in non-batched CUTLASS MoE (#21121)

ElizaWszola committed 236 days ago

Verified 4adc66f6

[Doc] Fix typo in model name (#21178)

DarkLight1337 committed 236 days ago

Verified 55ad6487

[Bugfix] The special_tokens in tokenizer should also be controlled by do_lower_case in encoder_config. (#20750)

noooop committed 237 days ago

Verified 5895afd7

[Model] Re-add the implicit conversion feature for as_seq_cls_model (#21103)

noooop committed 237 days ago

Verified ca4eb82b

[Misc] Make MM embedding merge interface explicit in model runner (#21147)

Roger Wang committed 237 days ago

Verified ba2dfbb0

[benchmark] Sending request strictly follows the random intervals (#21108)

Jialin committed 237 days ago

Verified 1bf65138

[Misc] Do not print async output warning for v1 (#21151)

WoosukKwon committed 237 days ago

Verified 54cf1cae

[Perf] Add swap_ab to SM90 FP8 non-block CUTLASS moe grouped gemm (#20911)

shixianc committed 237 days ago

Verified 5780121c

[Core] FlashInfer CUTLASS fused MoE backend (NVFP4) (#20037)

wenscarl committed 237 days ago

Verified c7d8724e

[Doc] Add inplace weights loading example (#19640)

22quinn committed 237 days ago

Verified b38baabc

[Attention] Make local attention backend agnostic (#21093)

LucasWilkinson committed 237 days ago

Verified 89cab4d0

[Docs] Update supported models documentation with missing models (#20844)

luccafong committed 237 days ago

Verified b9a21e91

[Docs] Add minimal demo of Ray Data API usage (#21080)

crypdick committed 237 days ago

Verified c4e3b125

[Bugfix] Fix the tensor non-contiguous issue for Flashinfer TRT-LLM backend attention kernel (#21133)

elvischenv committed 237 days ago

Verified 8dfb45ca

[Log] Debugging Log with more Information (#20770)

yewentao256 committed 237 days ago

Verified 8a8fc946

[V0 deprecation] Remove V0 HPU backend (#21131)

WoosukKwon committed 237 days ago

Verified 4de71463

On environments where numa cannot be detected we get 0 (#21115)

ericcurtin committed 237 days ago

Verified ac9fb732

[Misc] Qwen MoE model supports LoRA (#20932)

jeejeelee committed 237 days ago

Verified a3a6c695

[Model] Update pooling model interface (#21058)

DarkLight1337 committed 237 days ago

Verified 90bd2ab6

[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE (#20762)

ElizaWszola committed 237 days ago

Verified 9fb2d220

[Docs] Move code block out of admonition now that it's short (#21118)

hmellor committed 237 days ago

Verified 2d6a3820

[Misc] Avoid unnecessary import (#21106)

wangxiyuan committed 237 days ago

Verified 89e3c4e9

[Docs] Improve docstring formatting for `FusedMoEParallelConfig.make` (#21117)

hmellor committed 237 days ago

Verified fe8a2c54

[VLM] Add Nemotron-Nano-VL-8B-V1 support (#20349)

kylehh committed 237 days ago

Verified 4ef00b5c

Older