PR #15157 CUDA: attention sinks for mma FlashAttention

CUDA: attention sinks for mma FlashAttention #15157

JohannesGaessler merged 1 commit into ggml-org:master from JohannesGaessler:cuda-fa-mma-sink-3

CUDA: attention sinks for mma FlashAttention

e95d0430

github-actions added Nvidia GPU

github-actions added ggml

slaren approved these changes on 2025-08-07

ggerganov approved these changes on 2025-08-08

JohannesGaessler merged 1425f587 into master 31 days ago

am17an commented on 2025-08-08

Reviewers

ggerganov

slaren

am17an

Assignees

No one assigned

Labels

Nvidia GPU ggml

Milestone

No milestone