[CUDA ] Write an optimized flash_attn_stream_k_fixup kernel #21159
gaugarg-nv
changed the title Write an optimized flash_attn_stream_k_fixup kernel [CUDA ] Write an optimized flash_attn_stream_k_fixup kernel 14 days ago
gaugarg-nv
force pushed
from
d1fd632a
to
25ef2dfa
9 days ago
Write an optimized flash_attn_stream_k_fixup kernel
2ab29b93
Use the new kernel only for nblocks_stream_k_raw > 4 * ntiles_dst to …
19326bae
Address review comments
bb28013b
gaugarg-nv
force pushed
from
35d08300
to
bb28013b
9 days ago
ggerganov
approved these changes
on 2026-04-06
am17an
commented
on 2026-04-06
Address review comments
f4daaf5e
am17an
approved these changes
on 2026-04-06
Revert variable names to original
e4b95588
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub