onnxruntime
99f2b806 - Fix cuda memory access violation in GQA FlashAttention (#24447)

Commit
299 days ago
Fix cuda memory access violation in GQA FlashAttention (#24447) ### Description zeros_ memory buffer was uninitialized, but it must be initialized to zero. ### Motivation and Context A memory allocator change in GenAI started crashing in FlashAttention and this was eventually tracked down to be the cause. The allocator change was innocent. I'm not sure how this didn't fail previously, or if it was we weren't getting the reports about it. Co-authored-by: Ryan Hill <{ID}+{username}@users.noreply.github.com>
Author
Parents
Loading