vllm
228023b3 - [Bugfix][MoE] Fix 6-8% decode regression: prefer multi-stream shared expert overlap (#38990)

Commit
31 days ago
[Bugfix][MoE] Fix 6-8% decode regression: prefer multi-stream shared expert overlap (#38990) Signed-off-by: Martin Vit <martin@voipmonitor.org> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Author
Parents
Loading