vllm-project/vllm

Pull Requests Commits

tlrmchlsmth committed 336 days ago

ddb65dad

Remove mamba-ssm package

tlrmchlsmth committed 336 days ago

c41ea526

[gpt-oss] Enhance error msg on attention sink init (#22335)

zyongye committed 336 days ago

Verified 31f5dc5b

[gpt-oss] Add loop for built-in tool call (#22374)

WoosukKwon committed 336 days ago

Verified ec7cb192

[Bugfix] Make condition in triton kernel constexpr (#22370)

gshtras committed 336 days ago

Verified 2435ea7e

[BugFix] Fix triton compile error in `kernel_unified_attention_2/3d` caused by attention sinks (#22368)

LucasWilkinson committed 336 days ago

Verified 4a6b72c2

add the codes to check AMD Instinct GPU number (#22367)

zhangnju committed 336 days ago

Verified b4b9813b

[BugFix] Fix FA2 RuntimeError when sinks is provided (#22365)

LucasWilkinson committed 336 days ago

Verified 2cb6ef89

[Minor] Fix type (#22347)

WoosukKwon committed 336 days ago

Verified 9edd1db0

[gpt-oss] Support chat completion api (#22342)

WoosukKwon committed 336 days ago

Verified f263a4b5

[gpt-oss] add model to supported models doc (#22336)

Roger Wang committed 336 days ago

Verified 54991c54

[gpt-oss] Add Tool/ConversationContext classes and harmony_utils (#22340)

WoosukKwon committed 336 days ago

Verified 178d03fb

[Misc] Clean up duplicated hf overrides (#22311)

Isotr0py committed 336 days ago

Verified fa00c5d7

[gpt-oss] Add openai-harmony as default dependency (#22332)

WoosukKwon committed 336 days ago

Verified 134a8ee8

[gpt-oss] flashinfer attention sink init (#22330)

zyongye committed 336 days ago

Verified 90ec0069

[GptOss] Add GptOss reasoning parser to support structure output (#22322)

heheda12345 committed 336 days ago

Verified a47e6ffe

[ROCm] Add attention sink to use_rocm_custom_paged_attention (#22329)

WoosukKwon committed 336 days ago

Verified 98a3a810

Add GPT-OSS model code and config [1/N] (#22327)

WoosukKwon committed 336 days ago

Verified de98252f

Update transformers to `v4.55` (#21931)

hmellor committed 336 days ago

Verified 796bae07

Add attention sink in attention backends (#22320)

WoosukKwon committed 336 days ago

Verified 6e209243

Increase openai-python version (#22316)

WoosukKwon committed 336 days ago

Verified dd16bdc7

Upgrade FA3 for attention sink (#22313)

WoosukKwon committed 336 days ago

Verified e3c876dc

[Bugfix][CI/Build][ROCm] Make sure to use the headers from the build folder on ROCm (#22264)

gshtras committed 336 days ago

Verified 5d5d419c

[Bugfix] Skip dead and non-GPU nodes for Ray DP engine allocation (#22275)

ruisearch42 committed 336 days ago

Verified 302962e8

[Perf] Parallelize fill_bitmask to accelerate high-throughput guided decoding (#21862)

benchislett committed 336 days ago

Verified 7e6544c7

[Bugfix] Fix MoE BNB version (#22260)

jeejeelee committed 336 days ago

Verified 8e6c7e87

[Bugfix] Fix 3D input passed into cutlass_scaled_mm (#22278)

mgoin committed 336 days ago

Verified 6a515304

[Bugfix] Remove faulty test for oot attention backend (#22286)

mgoin committed 337 days ago

Verified 35509fc5

[CI][TPU] Fix docker clean up (#22271)

lsy323 committed 337 days ago

Verified 4b29d278

[bugfix] fix blackwell deepep installation (#22255)

youkaichao committed 337 days ago

Verified 59a0b855

Older