vllm
ef2c4f77 - [Bugfix] Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding (#37442)

Commit
51 days ago
[Bugfix] Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding (#37442) Signed-off-by: Elvir Crncevic <elvircrn@gmail.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Author
Parents
Loading