onnxruntime
Fix cuda memory access violation in GQA FlashAttention
#24447
Merged

Fix cuda memory access violation in GQA FlashAttention #24447

RyanUnderhill
zeros_ buffer was uninitialized so wasn't always zeros. This would le…
9d4c6bb0
RyanUnderhill RyanUnderhill requested a review from aciddelgado aciddelgado 301 days ago
aciddelgado
aciddelgado approved these changes on 2025-04-16
baijumeswani
baijumeswani approved these changes on 2025-04-16
baijumeswani
baijumeswani commented on 2025-04-16
tianleiwu tianleiwu changed the title Fix cuda memory access violation in FlashAttention Fix cuda memory access violation in GQA FlashAttention 301 days ago
RyanUnderhill RyanUnderhill merged 99f2b806 into main 301 days ago
RyanUnderhill RyanUnderhill deleted the ryanunderhill/flashattention_crash_fix branch 301 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone