vllm-project/vllm

Pull Requests Commits

add debug cruft

tlrmchlsmth committed 1 year ago

fcec8c88

tlrmchlsmth committed 1 year ago

850dafea

tlrmchlsmth committed 1 year ago

b4f17e12

tlrmchlsmth committed 1 year ago

21ffc735

tlrmchlsmth committed 1 year ago

39d5d33f

tlrmchlsmth committed 1 year ago

7a821f0e

tlrmchlsmth committed 1 year ago

26fd8ca3

tlrmchlsmth committed 1 year ago

d5f20676

fixes - use-fp8-dispatch

varun-sundar-rabindranath committed 1 year ago

2b5ad9f2

DeepGEMM LL optimizations

tlrmchlsmth committed 1 year ago

299f8291

Merge remote-tracking branch 'nm/varun/deepep-fp8-dispatch' into ll_deepgemm_opt

tlrmchlsmth committed 1 year ago

104a984e

[Bugfix] fix RAY_CGRAPH_get_timeout is not set successfully (#19725)

chaunceyjiang committed 1 year ago

Verified 12575cfa

[Hardware][AMD] integrate aiter chunked prefill into vllm (#18596)

Zzz9990 committed 1 year ago

Verified 8b6e1d63

deep_ep + use_fp8_dispatch

varun-sundar-rabindranath committed 1 year ago

8de2fd39

[Qwen] Add tagging rule for Qwen related PRs (#19799)

houseroad committed 1 year ago

Verified 735a9de7

[Platform] Allow platform use V1 Engine by default (#19792)

wangxiyuan committed 1 year ago

Verified 257ab954

[doc] fix the incorrect label (#19787)

reidliu41 committed 1 year ago

Verified cca91a7a

[Minor] Zero-initialize attn output buffer (#19784)

WoosukKwon committed 1 year ago

Verified f04d6045

[V1] Decouple GPU and TPU `InputBatch` (#19778)

afeldman-nm committed 1 year ago

Verified 19a53b27

[V1][P/D] An native implementation of xPyD based on P2P NCCL (#18242)

Abatom committed 1 year ago

Verified eccdc831

[V1] Add API docs for EncoderCacheManager (#19294)

russellb committed 1 year ago

Verified 5f52a846

[Misc] Add __str__ for RequestStatus (#19780)

lk-chen committed 1 year ago

Verified d4629dc4

[MISC] correct DeviceConfig device field static type analysis (#19699)

andyxning committed 1 year ago

Verified 6e9cc73f

[MISC] correct copy_blocks src_to_dists param type (#19696)

andyxning committed 1 year ago

Verified c53711bd

[TPU] Update torch version to include paged attention kernel change (#19706)

Chenyaaang committed 1 year ago

Verified dac8cc49

[Feature][ROCm] Add full graph capture support for TritonAttentionBackend (#19158)

charlifu committed 1 year ago

Verified a44b1c95

[Bugfix] Fix faulty triton importing logic when using Ray for DP (#19734)

mgoin committed 1 year ago

Verified b447624e

[Misc] Update lmcache connector with the latest connector apis (#19441)

YaoJiayi committed 1 year ago

Verified cda92307

Remove sm120 arch from sm100 cutlass kernel arch list (#19716)

mgoin committed 1 year ago

Verified bf57ccc5

[Perf] Optimize `moe_align_block_size` CUDA kernel (#19572)

yewentao256 committed 1 year ago

Verified ffb2cd6b

Older