llama.cpp
[CUDA ] Write an optimized flash_attn_stream_k_fixup kernel
#21159

Open

[CUDA ] Write an optimized flash_attn_stream_k_fixup kernel #21159

gaugarg-nv wants to merge 2 commits into ggml-org:master from gaugarg-nv:fa_opt

Write an optimized flash_attn_stream_k_fixup kernel

99c3df82

gaugarg-nv requested a review 3 days ago

github-actions added Nvidia GPU

github-actions added ggml

gaugarg-nv changed the title ~~Write an optimized flash_attn_stream_k_fixup kernel~~ [CUDA ] Write an optimized flash_attn_stream_k_fixup kernel 3 days ago

Use the new kernel only for nblocks_stream_k_raw > 4 * ntiles_dst to …

d1fd632a

JohannesGaessler commented on 2026-03-30

Reviewers

JohannesGaessler

Assignees

No one assigned

Labels

Nvidia GPU ggml

Milestone

No milestone

llama.cpp [CUDA ] Write an optimized flash_attn_stream_k_fixup kernel #21159 Open

[CUDA ] Write an optimized flash_attn_stream_k_fixup kernel #21159

llama.cpp
[CUDA ] Write an optimized flash_attn_stream_k_fixup kernel
#21159

Open