llama.cpp
CPU/CUDA: Gemma 2 FlashAttention support
#8542
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
4
Changes
View On
GitHub
CPU/CUDA: Gemma 2 FlashAttention support
#8542
JohannesGaessler
merged 4 commits into
ggml-org:master
from
JohannesGaessler:fattn-logit-softcap
github-actions
added
testing
github-actions
added
Nvidia GPU
github-actions
added
ggml
JohannesGaessler
added
Review Complexity : Medium
ggerganov
commented on 2024-07-17
slaren
commented on 2024-08-10
CPU/CUDA: Gemma 2 FlashAttention support
86184137
apply logit_softcap to scale in kernel
8043640e
disable logit softcapping tests on Metal
832c6ee3
slaren
approved these changes on 2024-08-24
remove metal check
6e408045
JohannesGaessler
force pushed
from
c3254644
to
6e408045
1 year ago
JohannesGaessler
merged
e11bd856
into master
1 year ago
Login to write a write a comment.
Login via GitHub
Reviewers
slaren
ggerganov
countzero
Assignees
No one assigned
Labels
testing
Nvidia GPU
Review Complexity : Medium
ggml
Milestone
No milestone
Login to write a write a comment.
Login via GitHub