CUDA: Add Flash Attention Support for Head Dimension 512 #20998
flash attention support for head dimension 512 added
3295b013
FA D=512 - match 576 configs, limit ncols2, revert vec cap
236cc490
fix HIP tile kernel build for D=512
f2ab605a
fix HIP tile kernel occupancy for D=512 on AMD
83f9dd1c
Apply suggestions from code review
a74faf55
ggerganov
approved these changes
on 2026-03-30
IMbackK
requested changes
on 2026-03-30
fix tile FA compilation
85bce7a8
IMbackK
approved these changes
on 2026-03-31
Assignees
No one assigned
Labels
Nvidia GPU
python
ggml
Login to write a write a comment.
Login via GitHub