llama.cpp
CUDA: faster FA for GQA > 1 but not power of 2
#19092
Merged

CUDA: faster FA for GQA > 1 but not power of 2 #19092

JohannesGaessler
JohannesGaessler CUDA: faster FA for GQA > 1 but not power of 2
66f7a86c
github-actions github-actions added Nvidia GPU
github-actions github-actions added python
github-actions github-actions added ggml
ggerganov
ggerganov approved these changes on 2026-01-25
JohannesGaessler
JohannesGaessler JohannesGaessler merged 0c21677e into master 28 days ago
ggerganov
jacekpoplawski
JohannesGaessler
ggerganov
JohannesGaessler

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone