llama.cpp
[CUDA] Reduce the number of stream-k blocks to reduce the overhead of the flash_attn_stream_k_fixup kernel
#21086
Closed

[CUDA] Reduce the number of stream-k blocks to reduce the overhead of the flash_attn_stream_k_fixup kernel #21086

gaugarg-nv wants to merge 3 commits into ggml-org:master from gaugarg-nv:reduce_stream_k_block
gaugarg-nv
gaugarg-nv Reduce the number of stream-k blocks to reduce the overhead of the fl…
38a69358
gaugarg-nv gaugarg-nv requested a review 6 days ago
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
IMbackK
IMbackK dismissed these changes on 2026-03-27
gaugarg-nv Fix compilation error
244f50d5
IMbackK IMbackK dismissed their stale review 6 days ago
Missclick
gaugarg-nv Remove trailing whitespace
4a2f0179
JohannesGaessler
gaugarg-nv
JohannesGaessler
gaugarg-nv
gaugarg-nv
gaugarg-nv gaugarg-nv closed this 4 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone