llama.cpp
[CUDA] Reduce the number of stream-k blocks to reduce the overhead of the flash_attn_stream_k_fixup kernel
#21086

Closed

[CUDA] Reduce the number of stream-k blocks to reduce the overhead of the flash_attn_stream_k_fixup kernel #21086

gaugarg-nv wants to merge 3 commits into ggml-org:master from gaugarg-nv:reduce_stream_k_block

Reduce the number of stream-k blocks to reduce the overhead of the fl…

38a69358

gaugarg-nv requested a review 6 days ago

github-actions added Nvidia GPU

github-actions added ggml

IMbackK dismissed these changes on 2026-03-27

Fix compilation error

244f50d5

IMbackK dismissed their stale review 6 days ago

Missclick

Remove trailing whitespace

4a2f0179

gaugarg-nv closed this 4 days ago

Reviewers

IMbackK

Assignees

No one assigned

Labels

Nvidia GPU ggml

Milestone

No milestone