llama.cpp
CUDA: Add Flash Attention Support for Head Dimension 512
#20998

Merged

CUDA: Add Flash Attention Support for Head Dimension 512 #20998

JohannesGaessler merged 6 commits into ggml-org:master from anavp-nvidia:fattn_head_dim_512_support

flash attention support for head dimension 512 added

3295b013

anavp-nvidia requested a review 7 days ago

github-actions added Nvidia GPU

github-actions added python

github-actions added ggml

FA D=512 - match 576 configs, limit ncols2, revert vec cap

236cc490

fix HIP tile kernel build for D=512

f2ab605a

fix HIP tile kernel occupancy for D=512 on AMD

83f9dd1c

JohannesGaessler commented on 2026-03-30

Apply suggestions from code review

a74faf55

JohannesGaessler approved these changes on 2026-03-30

ggerganov approved these changes on 2026-03-30

IMbackK requested changes on 2026-03-30

fix tile FA compilation

85bce7a8

JohannesGaessler force pushed from 403966f2 to 85bce7a8 1 day ago

IMbackK approved these changes on 2026-03-31

JohannesGaessler approved these changes on 2026-04-01

JohannesGaessler merged 88458164 into master 1 day ago

ggerganov commented on 2026-04-01

Reviewers

ggerganov

JohannesGaessler

IMbackK

Assignees

No one assigned

Labels

Nvidia GPU python ggml

Milestone

No milestone