vllm-project/vllm

Pull Requests Commits

[V1] Fix profiling.py

alexm-redhat committed 1 year ago

ccd21e19

[TPU][V1] Make `--disable_chunked_mm_input` mandatory for serving MM models (#16483)

NickLucche committed 1 year ago

Verified 4d022cbc

Fix erroneous "model doesn't support compile" warning (#16486)

zou3519 committed 1 year ago

Verified 70de35a8

[Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (#12779)

tzielinski-habana committed 1 year ago

Verified 34b2cf3b

[Bugfix] Fix bugs of running Quark quantized models (#16236)

chaow-amd committed 1 year ago

Verified 9e90c9f7

[Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173)

DefTruth committed 1 year ago

Verified e9528f6d

Don't install triton on `ppc64le` platform (#16470)

hmellor committed 1 year ago

Verified 51baa9c3

[Misc] update api_client example (#16459)

reidliu41 committed 1 year ago

Verified 35e076b3

[Misc] Raise error for V1 not supporting Long LoRA. (#16415)

jeejeelee committed 1 year ago

Verified a26f59cc

Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (#16447)

mgoin committed 1 year ago

Verified aa3b3d76

[Core][LoRA][1/N] Add LoRA for EncoderDecoderModelRunner (#15990)

jeejeelee committed 1 year ago

Verified f7030df3

Revert "[Model] use AutoWeightsLoader for deepseek_v2, internlm2" (#16453)

DefTruth committed 1 year ago

Verified 905e91e9

[Bugfix] Don't set an upper bound on repetition penalty (#16403)

alex-jw-brooks committed 1 year ago

Verified f8f9c0ba

[CPU][Bugfix] Fix CPU docker issues (#16454)

bigPYJ1151 committed 1 year ago

Verified dda81102

[Bugfix][VLM] Fix failing Phi-4-MM multi-images tests and add vision-speech test (#16424)

Isotr0py committed 1 year ago

Verified 93195146

Update supported_hardware.md for TPU INT8 (#16437)

mgoin committed 1 year ago

Verified ed375995

[Llama4] Enable attention temperature tuning by default for long context (>32k) (#16439)

shfoss committed 1 year ago

Verified 99ef59cf

update benchmark_serving_structured_output to include auto backend (#16438)

Chenyaaang committed 1 year ago

Verified d544d141

check input length of sonnet samples (#16423)

alexey-belyakov committed 1 year ago

Verified 3e397a94

Fix range_ratio Bug in RandomDataset (#16126)

jadewang21 committed 1 year ago

Verified 268c3250

[TPU][V1] Disable per-request seed/Generator (#16172)

NickLucche committed 1 year ago

Verified 3cc9af88

[Bugfix] Fix output token length check logic (#16419)

eeslook committed 1 year ago

Verified 7cd0bd72

[VLM] Avoid unnecessary dummy multimodal data during processing (#16416)

DarkLight1337 committed 1 year ago

Verified 56d4aefa

[V1] Zero-copy tensor/ndarray serialization/transmission (#13790)

njhill committed 1 year ago

Verified dd143ef5

[Model] Reduce redundant computations in mamba2 blocks for Bamba-9B (#15423)

cyang49 committed 1 year ago

Verified daefed05

[Bugfix] Fix bug when dataset is json (#15899)

Chenyaaang committed 1 year ago

Verified 5fbab20e

[V1][Spec Decode] Eagle Model loading (#16035)

LiuXiaoxuanPKU committed 1 year ago

Verified e8224f3d

[V1] Set structured output backend to `auto` by default (#15724)

russellb committed 1 year ago

Verified 9665313c

Improve configs - `ParallelConfig` (#16332)

hmellor committed 1 year ago

Verified 0c54fc72

[TPU][V1] Use `language_model` interface for getting text backbone in MM (#16410)

NickLucche committed 1 year ago

Verified c1b57855

Older