llama.cpp
CPU/CUDA: Gemma 2 FlashAttention support
#8542
Merged

CPU/CUDA: Gemma 2 FlashAttention support #8542

JohannesGaessler
github-actions github-actions added testing
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
JohannesGaessler JohannesGaessler added Review Complexity : Medium
ggerganov
ggerganov
ggerganov commented on 2024-07-17
Dampfinchen
JohannesGaessler
Dampfinchen
vitorfdl
JohannesGaessler
vitorfdl
Rotatingxenomorph
slaren
slaren commented on 2024-08-10
JohannesGaessler CPU/CUDA: Gemma 2 FlashAttention support
86184137
JohannesGaessler apply logit_softcap to scale in kernel
8043640e
JohannesGaessler disable logit softcapping tests on Metal
832c6ee3
slaren
slaren approved these changes on 2024-08-24
JohannesGaessler remove metal check
6e408045
JohannesGaessler JohannesGaessler force pushed from c3254644 to 6e408045 1 year ago
JohannesGaessler JohannesGaessler merged e11bd856 into master 1 year ago
strawberrymelonpanda
JohannesGaessler
strawberrymelonpanda
strawberrymelonpanda
slaren
JohannesGaessler

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone