llama.cpp
CUDA: Prefer vector flash decoding kernel for Gemma models
#12738
Merged

CUDA: Prefer vector flash decoding kernel for Gemma models #12738

gaugarg-nv
gaugarg-nv Prefer vector flash decoding kernel for Gemma models
f7d07dd2
gaugarg-nv gaugarg-nv requested a review from JohannesGaessler JohannesGaessler 1 year ago
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
JohannesGaessler
JohannesGaessler approved these changes on 2025-04-03
gaugarg-nv Update ggml/src/ggml-cuda/fattn.cu
ce71aba0
JohannesGaessler JohannesGaessler merged c262bedd into master 1 year ago
gaugarg-nv gaugarg-nv deleted the gemma_flash_attention branch 63 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone