PR #8542 CPU/CUDA: Gemma 2 FlashAttention support

CPU/CUDA: Gemma 2 FlashAttention support #8542

JohannesGaessler merged 4 commits into ggml-org:master from JohannesGaessler:fattn-logit-softcap

github-actions added testing

github-actions added Nvidia GPU

github-actions added ggml

JohannesGaessler added Review Complexity : Medium

ggerganov commented on 2024-07-17

slaren commented on 2024-08-10

CPU/CUDA: Gemma 2 FlashAttention support

86184137

apply logit_softcap to scale in kernel

8043640e

disable logit softcapping tests on Metal

832c6ee3

slaren approved these changes on 2024-08-24

remove metal check

6e408045

JohannesGaessler force pushed from c3254644 to 6e408045 1 year ago

JohannesGaessler merged e11bd856 into master 1 year ago

Reviewers

slaren

ggerganov

countzero

Assignees

No one assigned

Labels

testing Nvidia GPU Review Complexity : Medium ggml

Milestone

No milestone