vllm
56a63717
- [Update] Use FlashInfer fast_decode_plan directly instead of replication (#34687)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
66 days ago
[Update] Use FlashInfer fast_decode_plan directly instead of replication (#34687) Signed-off-by: Andrii <askliar@nvidia.com> Co-authored-by: Andrii <askliar@nvidia.com>
References
#34687 - [Update] Use FlashInfer fast_decode_plan directly instead of replication
Author
askliar
Parents
62830211
Loading