vllm-project/vllm

Pull Requests Commits

Update vllm/config/model.py

robertgshaw2-redhat committed 1 day ago

Verified 6de6e95e

fix failing quantizaton test

Robert Shaw committed 2 days ago

b39074e4

Robert Shaw committed 2 days ago

2f1daa86

Robert Shaw committed 2 days ago

7bb15cda

updated list of schemes

Robert Shaw committed 2 days ago

0afd11d9

Robert Shaw committed 2 days ago

522aff03

Robert Shaw committed 2 days ago

430670c5

[BugFix] Async scheduling: handle model forward errors more cleanly (#31611)

njhill committed 2 days ago

Verified b53b89fd

[misc] Sort uvicorn log level description according to verbosity (#31137)

andyxning committed 2 days ago

Verified 6522721d

fix no think of GLM-4.5 / GLM-4.7 (#31449)

zRzRzRzRzRzRzR committed 3 days ago

Verified 0d4044ed

[Docs] Fix argparse include path for mm-processor benchmark (#31654)

reaganjlee committed 3 days ago

Verified 41ab1797

[MoE Refactor][13/N] Convert FI to Use PFNoEP (#31533)

robertgshaw2-redhat committed 3 days ago

Verified 268b1c55

[CI][Bugfix] Fix token counting in chunked prefill compl test (#31630)

AndreasKaratzas committed 4 days ago

Verified 4f9ce35a

Improve HF qwen3_omni: preserve audio_sample_rate in kwargs restructuring (#29255)

jeremyteboul committed 4 days ago

Verified 97a01308

[Core] Parse vLLM engine required fields from hf_config to model_arch_config (#28454)

charlotte12l committed 4 days ago

Verified 0eee877f

[Benchmark] Fix OOM during MoE kernel tuning for large models (#31604)

massif-01 committed 4 days ago

Verified a0e9ee83

[MoE Refactor] Explicit construct mk for flashinfer bf16 kernel (#31504)

zyongye committed 4 days ago

Verified a3f2f409

[MoE Refactor] Split `invoke_fused_moe_kernel` (#31050)

zyongye committed 4 days ago

Verified 5a468ff7

[MoE] Fix output_shape calculation in Attention layer to handle 3D query inputs (#31596)

AndreasKaratzas committed 4 days ago

Verified 6ef770df

[BugFix] Support online dense model DP without overhead (#30739)

njhill committed 4 days ago

Verified bd877162

CustomOp: test forward dispatch for grouped_topk (#31530)

xinyu-intel committed 4 days ago

Verified 08f425ba

Add multimodal input method in the documentation (#31601)

labAxiaoming committed 4 days ago

Verified a01f2fae

[Bugfix] Fix weight_loader v1 block scale (#31103)

kyuyeunk committed 5 days ago

Verified cc410e86

[Bugfix][Hardware][AMD] Fix last_page_len calculation in AITER MLA decode (#31282)

c0de128 committed 5 days ago

Verified 825c2dc1

Remove unused `use_marlin` variable in `Mxfp4MoEMethod` (#31549)

vsourirajan committed 5 days ago

Verified 1f43c121

[Bugfix] Fix activation quantization for compressed-tensors W4A16 (#31572)

Tmn07 committed 5 days ago

Verified ca179d0f

[ROCm][CI] Fix ModernBERT token classification test (#31612)

AndreasKaratzas committed 5 days ago

Verified 013b5408

[Model] Enable LoRA support for tower and connector in LLaVA (#31513)

jayhemnani9910 committed 5 days ago

Verified 5ac55eb3

[Bugfix] Fix block size used in EAGLE slot mapping (#31540)

benchislett committed 5 days ago

Verified ea53ca5e

feat: support LoRA for DeepSeek-OCR(Language Model part) (#31569)

zhima771 committed 5 days ago

Verified 27864a85

Older