vllm-project/vllm

Pull Requests Commits

Robert Shaw committed 6 hours ago

bec5dd8d

remove quant config function

Robert Shaw committed 6 hours ago

05f116ab

Robert Shaw committed 6 hours ago

bc240a3e

cleanup further

Robert Shaw committed 6 hours ago

6ef52bb4

add more values needed for apply

Robert Shaw committed 7 hours ago

fabe9799

Robert Shaw committed 7 hours ago

884461e8

remove unnneeded conditional

Robert Shaw committed 7 hours ago

c4470f74

fix marlin selection logic

Robert Shaw committed 7 hours ago

f5e62cbe

fix marlin dtype

Robert Shaw committed 7 hours ago

e986eb70

Robert Shaw committed 7 hours ago

797480df

Robert Shaw committed 7 hours ago

dc5c4664

[Doc] Add documents for multi-node distributed serving with MP backend (#30509)

Isotr0py committed 10 hours ago

Verified 7c16f3fb

[Docs] Clarify Expert Parallel behavior for attention and MoE layers (#30615)

majiayu000 committed 11 hours ago

Verified ddbfbe52

set assume_32bit_indexing and pass unbacked hints (#30459)

laithsakka committed 13 hours ago

Verified 763963aa

[Refactor] `TokenizerRegistry` only uses lazy imports (#30609)

DarkLight1337 committed 13 hours ago

Verified 39cefbdf

[Bugfix] Qwen3-next with --hf-overrides \{\"num_hidden_layers\":8\} (#30433)

heheda12345 committed 14 hours ago

Verified ace34e37

[CI/Build] Fix broken mm processor test Mistral-3-large (#30597)

Isotr0py committed 16 hours ago

Verified e5db3e27

[Chore] Adjust tokenizer import to avoid circular imports (#30601)

DarkLight1337 committed 16 hours ago

Verified 64251f48

[Scheduer] Simplify stop checking for pooling models (#30591)

njhill committed 19 hours ago

Verified 1cec5b7e

[Bugfix] Dictionary MM embeddings for online chat (#30507)

DarkLight1337 committed 21 hours ago

Verified b09806e2

[Misc][Quantization] Clarify the intent of GGUF `FusedMoE` weight materialization (#30310)

a4lg committed 23 hours ago

Verified fdc135d7

[Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484)

LopezCastroRoberto committed 1 day ago

Verified 4fa7ce46

[CI] Whisper logprobs tests (#30504)

NickLucche committed 1 day ago

Verified 57e9bf18

[CI] Update several models in registry that are available online now (#30514)

mgoin committed 1 day ago

Verified 2f32a68d

[Docs] Remove references to `VLLM_ATTENTION_BACKEND` (#30564)

MatthewBonanni committed 1 day ago

Verified f5dfbbd8

Add IBM and Red Hat to compute resources sponsors (#30581)

mgoin committed 1 day ago

Verified fc011942

[Bugfix] Pass FA version in `MultiHeadAttention` (#30575)

MatthewBonanni committed 1 day ago

Verified 86a32615

[CI/Build][Kernel][BugFix][AMD] Fix per_token_group_quant_fp8 to use correct fp8 min/max values and update atol/rtol in test_quantfp8_group_functionality (#30292)

rasmith committed 1 day ago

Verified 08f8a562

[ci] Mark PrimeRL integration test as soft fail (#30578)

khluu committed 1 day ago

Verified b4039c08

[Refactor] Reduce duplicate code in `per_token_group_quant` cuda kernels (#30496)

yewentao256 committed 1 day ago

Verified 1e6b1153

Older