llama.cpp
CUDA: Add Flash Attention Support for Head Dimension 512
#20998
Merged

CUDA: Add Flash Attention Support for Head Dimension 512 #20998

anavp-nvidia
anavp-nvidia flash attention support for head dimension 512 added
3295b013
anavp-nvidia anavp-nvidia requested a review 7 days ago
JohannesGaessler
ggerganov
anavp-nvidia
JohannesGaessler
github-actions github-actions added Nvidia GPU
github-actions github-actions added python
github-actions github-actions added ggml
ggerganov
anavp-nvidia FA D=512 - match 576 configs, limit ncols2, revert vec cap
236cc490
anavp-nvidia fix HIP tile kernel build for D=512
f2ab605a
anavp-nvidia fix HIP tile kernel occupancy for D=512 on AMD
83f9dd1c
anavp-nvidia
JohannesGaessler
JohannesGaessler commented on 2026-03-30
anavp-nvidia Apply suggestions from code review
a74faf55
JohannesGaessler
JohannesGaessler approved these changes on 2026-03-30
anavp-nvidia
JohannesGaessler
IMbackK
ggerganov
ggerganov
ggerganov approved these changes on 2026-03-30
IMbackK
IMbackK
IMbackK
IMbackK requested changes on 2026-03-30
anavp-nvidia
IMbackK
anavp-nvidia
IMbackK
JohannesGaessler fix tile FA compilation
85bce7a8
JohannesGaessler JohannesGaessler force pushed from 403966f2 to 85bce7a8 1 day ago
JohannesGaessler
IMbackK
IMbackK approved these changes on 2026-03-31
anavp-nvidia
JohannesGaessler
JohannesGaessler approved these changes on 2026-04-01
JohannesGaessler JohannesGaessler merged 88458164 into master 1 day ago
ozars
coder543
JohannesGaessler
coder543
ggerganov
ggerganov commented on 2026-04-01
JohannesGaessler

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone