[CUDA] Increase number of output elements per-thread block if the K-dimension is small #20635
am17an
commented
on 2026-03-16
Increase per-thread work if the K-dimension is small
cfbbfb25
gaugarg-nv
force pushed
from
4f20a445
to
cfbbfb25
22 days ago
gaugarg-nv
changed the title [CUDA] Use a single warp per element instead of a single block per element if the K-dimension is small [CUDA] Increase number of output elements per-thread block if the K-dimension is small 22 days ago
Limit this change to ncols_dst = 1
6374ae0e
tab to space
fd9e3348
am17an
approved these changes
on 2026-03-21
am17an
merged
ccb87fa3
into master 18 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub