llama.cpp
[CUDA ] Write an optimized flash_attn_stream_k_fixup kernel
#21159
Open

[CUDA ] Write an optimized flash_attn_stream_k_fixup kernel #21159

gaugarg-nv wants to merge 2 commits into ggml-org:master from gaugarg-nv:fa_opt
gaugarg-nv
gaugarg-nv Write an optimized flash_attn_stream_k_fixup kernel
99c3df82
gaugarg-nv gaugarg-nv requested a review 3 days ago
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
gaugarg-nv gaugarg-nv changed the title Write an optimized flash_attn_stream_k_fixup kernel [CUDA ] Write an optimized flash_attn_stream_k_fixup kernel 3 days ago
gaugarg-nv Use the new kernel only for nblocks_stream_k_raw > 4 * ntiles_dst to …
d1fd632a
gaugarg-nv
JohannesGaessler
JohannesGaessler commented on 2026-03-30

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone