vllm-project/vllm

Pull Requests Commits

WoosukKwon committed 141 days ago

2ad6985c

WoosukKwon committed 141 days ago

da03cb8f

[Optimization] Truncate kv page indices for sliding window attention

WoosukKwon committed 141 days ago

90d43db4

[Log] Debug Once for Randomizing dummy data for DP Rank (#22860)

yewentao256 committed 141 days ago

Verified df5afa82

[Model] Granite-4 support loading quantized checkpoint (#22925)

cyang49 committed 141 days ago

Verified 6cd69f51

[Kernels] Clean up FusedMoeMethodBase and modular kernel setup. Remove extra arguments from modular kernel methods. (#22035)

bnellnm committed 141 days ago

Verified 8ad7285e

[Structured Output] Make the output of structured output example more complete (#22481)

shen-shanshan committed 141 days ago

Verified 48b01fd4

[Benchmarks] Include image data when ShareGPT4V dataset is used. (#22955)

huachenheli committed 141 days ago

Verified 993d3d12

[FIXBUG] Correctly Apply Grammar Bitmask in Mixed Batches (#22896)

JartX committed 141 days ago

Verified 68af77e5

[BugFix] Skip the Q component for QKVParallelLinear in the case of QKVCrossParallelLinear since its width is 0 (#22369)

sstamenk committed 141 days ago

Verified 6b04039a

[V0 Deprecation] Remove advance_step (#22969)

WoosukKwon committed 141 days ago

Verified 1c859a13

[Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer (#20059)

fhl2000 committed 141 days ago

Verified 74f441f4

[Frontend] Expose do_log_stats interval to env (#22905)

Csrayz committed 141 days ago

Verified a0632a3e

[CI] Remove duplicated docs build from buildkite (#22924)

hmellor committed 141 days ago

Verified e8b40c7f

[Misc] Ignore ep_kernels_workspace (#22807)

jeejeelee committed 141 days ago

Verified 48f46369

[V1] [Hybrid] Support using float32 for state in Hybrid Models (Mamba2, Mamba1, Minimax) (#22928)

tdoublep committed 141 days ago

Verified 75531a6c

Improve multimodal hasher performance for re-used Image prompts (#22825)

p88h committed 141 days ago

Verified 22341b99

[MM] Allow skipping memory profiling for multimodal models. (#22950)

Roger Wang committed 141 days ago

Verified 49252cf5

[Bugfix] fix cuda 12.6 and 11.8 build (#22952)

jinzhen-lin committed 141 days ago

Verified 3e6dd400

[Bugfix] Unquote file uri before reading image (#22912)

sayandipdutta committed 142 days ago

Verified aa300c43

[V1] - Split Prefill and Decode for Mamba1 models (#22653)

amirai21 committed 142 days ago

Verified fe91ce95

[CI] Pooling models mteb test uses enforce_eager (#22878)

noooop committed 142 days ago

Verified 5406ebf5

[P/D]Provide bucket algorithm rate limiter for proxy_server (#22643)

frankie-ys committed 142 days ago

Verified b2c06509

Revert "[ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module." (#22956)

tjtanaa committed 142 days ago

Verified b2f6c247

[Mamba] - refactor: Renamed mamba_attn to mamba2_attn (#22818)

Josephasafg committed 142 days ago

Verified 3d232dbd

[Feature] Full Cuda Graph Support for Cutlass MLA and 6% E2E Throughput Improvement (#22763)

yewentao256 committed 142 days ago

Verified 5c3fbfe4

refactor: Change scaling factors calculation for flashinfer FusedMoE (#22812)

amirkl94 committed 142 days ago

Verified b4cef5e6

[CI Perf] Prune tests in `tests/kernels/attention/` (#22936)

mgoin committed 142 days ago

Verified 0fe85087

[CI Perf] Prune tests in `tests/kernels/moe/` (#22939)

mgoin committed 142 days ago

Verified d2b0e97e

[CI Perf] Prune tests in `tests/kernels/quantization/` (#22942)

mgoin committed 142 days ago

Verified 590bddbf

Older