[CUDA] Reduce the number of stream-k blocks to reduce the overhead of the flash_attn_stream_k_fixup kernel #21086
Reduce the number of stream-k blocks to reduce the overhead of the fl…
38a69358
IMbackK
dismissed these changes
on 2026-03-27
Fix compilation error
244f50d5
IMbackK
dismissed their stale review
6 days ago
Remove trailing whitespace
4a2f0179
Assignees
No one assigned