vllm-project/vllm

Pull Requests Commits

Merge branch 'main' into seemethere/cuda_arm64

mgoin committed 196 days ago

5667ed87

[ROCm][Bugfix] Fix compilation error in topk softmax fused kernel (#22819)

kliuae committed 196 days ago

Verified c6cd5ca3

[CI/Build] Skip gpt_big model test because of broken HF model (#22848)

Isotr0py committed 196 days ago

Verified df0e0f02

[CI/Build] Fix param mismatch in `test_eagle_correctness` (#22847)

DarkLight1337 committed 196 days ago

Verified b4b78d63

[CI] Fix `tests/v1/e2e/test_kv_sharing_fast_prefill.py` import on test (#22815)

NickLucche committed 196 days ago

Verified 12817a8a

Update torch_cuda_arch_list

seemethere committed 196 days ago

846aa6dc

ci: Add CUDA + arm64 relase builds

seemethere committed 196 days ago

f8b2e006

[CI/Build] Update VLM common tests (#22841)

DarkLight1337 committed 196 days ago

Verified c9232d41

[Bugfix] Fix MiniCPMV Image input inference failed (#22813)

jio-H committed 196 days ago

Verified 9bd9294f

[Misc] clear and separate error messages for input too long and input + max-tokens too long (#22803)

Roger Wang committed 196 days ago

Verified da270519

[Core] Use individual MM items in P0/P1 cache and model runner (#22570)

DarkLight1337 committed 196 days ago

Verified 19b927e5

[Frontend] Multithreaded async multimodal load_bytes (#22710)

milesial committed 196 days ago

Verified 20d65aa7

Fix GGUF loader for Qwen3 MoE. (#22785)

Gh0u1L5 committed 196 days ago

Verified b159c0a6

Remove unnecessary CUDA sync of qwen image and video preprocess (#22792)

cyyever committed 196 days ago

Verified 6772bb0f

[Bugfix][mamba] Fix type annotation of Mamba2Metadata (#22787)

heheda12345 committed 196 days ago

Verified fceafaf5

[Nixl][CI] Fix tests (#22806)

NickLucche committed 196 days ago

Verified 6b794c75

[FEATURE] support custom vllm tuned config path for fused moe triton kernels (#22791)

vermouth1992 committed 196 days ago

Verified 98deac38

[Frontend] Add chunked processing to handle long inputs in embedding models (#22280)

x22x22 committed 196 days ago

Verified 653124bd

[Platform] Custom ops support for FusedMoe (#22509)

wangxiyuan committed 196 days ago

Verified 0b1bdac6

[V1] Add tree drafting tests for eagle spec decoding (#22705)

TheEpicDolphin committed 196 days ago

Verified d94e3026

[Doc] Add max_lora_rank configuration guide (#22782)

chi2liu committed 196 days ago

Verified 3f52738d

[Bugfix] Fix Nemotron VL image processing (#22739)

ducviet00 committed 196 days ago

Verified a01e0018

[Model] Add missing prefix to glm4_1v (#22716)

zRzRzRzRzRzRzR committed 197 days ago

Verified 9e7e5baa

[Model] Add option to run Step3VisionEncoder in DP (#22697)

zzh142857 committed 197 days ago

Verified d16aa3da

[gpt-oss] upgrade gpt-oss to v0.0.3 and add version check (#22768)

heheda12345 committed 197 days ago

Verified 6807af8f

[Perf] Support topk softmax fused kernel for broader num_experts (#22211)

shixianc committed 197 days ago

Verified 4c558cf6

[Bug] Fix Unexpected Keyword Argument 'w1_bias' (#22757)

yewentao256 committed 197 days ago

Verified 77a6bf07

Remove unneeded ROCm platform import when using CUDA (#22765)

mgoin committed 197 days ago

Verified 4082338a

Force TRTLLM attention for gpt-oss on SM100 (#22678)

mgoin committed 197 days ago

Verified c6b92879

[Bugfix] Fix default enable for CUTLASS MLA on SM100 (#22738)

mgoin committed 197 days ago

Verified b1361c72

Older