llama.cpp
CUDA: add head size 72 for flash-attn
#16962
Merged

CUDA: add head size 72 for flash-attn #16962

ngxson merged 1 commit into ggml-org:master from theo77186:fattn-hs72
theo77186
theo77186 CUDA: add head size 72
72545ce2
theo77186 theo77186 requested a review from JohannesGaessler JohannesGaessler 212 days ago
ngxson
JohannesGaessler
JohannesGaessler approved these changes on 2025-11-03
theo77186
ngxson
ngxson approved these changes on 2025-11-03
github-actions github-actions added Nvidia GPU
github-actions github-actions added python
github-actions github-actions added ggml
ngxson ngxson merged 622cd010 into master 212 days ago
theo77186 theo77186 deleted the fattn-hs72 branch 212 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone