llama.cpp
CUDA: attention sinks for mma FlashAttention
#15157
Merged

CUDA: attention sinks for mma FlashAttention #15157

JohannesGaessler
JohannesGaessler CUDA: attention sinks for mma FlashAttention
e95d0430
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
slaren
JohannesGaessler
slaren
slaren
slaren approved these changes on 2025-08-07
slaren
ggerganov
slaren
JohannesGaessler
JohannesGaessler
ggerganov
ggerganov approved these changes on 2025-08-08
JohannesGaessler JohannesGaessler merged 1425f587 into master 31 days ago
JohannesGaessler
abrimogard
am17an
am17an commented on 2025-08-08

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone