PR #19092 CUDA: faster FA for GQA > 1 but not power of 2

CUDA: faster FA for GQA > 1 but not power of 2 #19092

JohannesGaessler merged 1 commit into ggml-org:master from JohannesGaessler:cuda-fa-gqa20-4

CUDA: faster FA for GQA > 1 but not power of 2

66f7a86c

github-actions added Nvidia GPU

github-actions added python

github-actions added ggml

ggerganov approved these changes on 2026-01-25

JohannesGaessler merged 0c21677e into master 28 days ago

Reviewers

ggerganov

Assignees

No one assigned

Labels

Nvidia GPU python ggml

Milestone

No milestone