vllm
[Update] Use FlashInfer fast_decode_plan directly instead of replication
#34687

Merged

[Update] Use FlashInfer fast_decode_plan directly instead of replication #34687

pavanimajety merged 9 commits into vllm-project:main from askliar:main

askliar requested a review from

mgoin 87 days ago

askliar requested a review from

pavanimajety 87 days ago

mergify added nvidia

Refactor FlashInfer metadata handling and decoding parameters in flas…

c654c373

mergify added v1

askliar force pushed to c654c373 87 days ago

askliar changed the title ~~[Update] Use FlashInfer fast_decode_plan directly instead of replication#32182~~ [Update] Use FlashInfer fast_decode_plan directly instead of replication 87 days ago

gemini-code-assist commented on 2026-02-17

Add a blank line for improved readability in flashinfer.py

4e749384

Merge branch 'main' of https://github.com/vllm-project/vllm

65247d83

Add tests for fast_plan_decode functionality in flashinfer

4749656b

askliar requested a review from

tlrmchlsmth 85 days ago

askliar requested a review from

WoosukKwon 85 days ago

askliar requested a review from

yewentao256 85 days ago

mgoin added ready

mgoin requested a review from

LucasWilkinson 79 days ago

Enhance fast_plan_decode in flashinfer to support tensor-core specifi…

16583f52

Refactor fast_plan_decode and update tests for non-tensor-core support

f713140f

Merge branch 'main' of https://github.com/vllm-project/vllm

62eca944

Remove deprecated test for non-tensor core GQA in FlashInfer, simplif…

49a2bca4

pavanimajety approved these changes on 2026-02-26

Merge branch 'main' into main

8e0d95de

pavanimajety merged 56a63717 into main 77 days ago

Reviewers

pavanimajety

gemini-code-assist

mgoin

tlrmchlsmth

WoosukKwon

yewentao256

LucasWilkinson

Assignees

No one assigned

Labels

ready v1 nvidia

Milestone

No milestone

vllm [Update] Use FlashInfer fast_decode_plan directly instead of replication #34687 Merged

[Update] Use FlashInfer fast_decode_plan directly instead of replication #34687

vllm
[Update] Use FlashInfer fast_decode_plan directly instead of replication
#34687

Merged