[CUDA ] Write an optimized flash_attn_stream_k_fixup kernel #21159
Write an optimized flash_attn_stream_k_fixup kernel
99c3df82
gaugarg-nv
changed the title Write an optimized flash_attn_stream_k_fixup kernel [CUDA ] Write an optimized flash_attn_stream_k_fixup kernel 3 days ago
Use the new kernel only for nblocks_stream_k_raw > 4 * ntiles_dst to …
d1fd632a
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub