llama.cpp
[CUDA ] Write an optimized flash_attn_stream_k_fixup kernel
#21159
Merged

[CUDA ] Write an optimized flash_attn_stream_k_fixup kernel #21159

gaugarg-nv
gaugarg-nv gaugarg-nv requested a review 14 days ago
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
gaugarg-nv gaugarg-nv changed the title Write an optimized flash_attn_stream_k_fixup kernel [CUDA ] Write an optimized flash_attn_stream_k_fixup kernel 14 days ago
gaugarg-nv
JohannesGaessler
JohannesGaessler commented on 2026-03-30
gaugarg-nv gaugarg-nv force pushed from d1fd632a to 25ef2dfa 9 days ago
gaugarg-nv Write an optimized flash_attn_stream_k_fixup kernel
2ab29b93
gaugarg-nv Use the new kernel only for nblocks_stream_k_raw > 4 * ntiles_dst to …
19326bae
gaugarg-nv Address review comments
bb28013b
gaugarg-nv gaugarg-nv force pushed from 35d08300 to bb28013b 9 days ago
gaugarg-nv
JohannesGaessler
JohannesGaessler
JohannesGaessler approved these changes on 2026-04-06
JohannesGaessler
ggerganov
ggerganov approved these changes on 2026-04-06
am17an
am17an commented on 2026-04-06
gaugarg-nv Address review comments
f4daaf5e
am17an
am17an approved these changes on 2026-04-06
gaugarg-nv
gaugarg-nv Revert variable names to original
e4b95588
JohannesGaessler
JohannesGaessler
JohannesGaessler approved these changes on 2026-04-06
JohannesGaessler JohannesGaessler merged 15f786e6 into master 6 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone