vulkan: optimizations for direct convolution #14933
vulkan: optimizations for direct convolution
9c12ef79
Three tiles sizes for CONV_2D, and a heuristic to choose
136ecfbc
reallow collectives for pre-Turing
95ee61ac
make SHMEM_PAD a spec constant
7d3553fa
fixes for intel perf - no shmem padding, placeholder shader core count
44566496
shader variants with/without unrolling
e8643c0f
0cc4m's fixes for AMD perf
d2a65ece
0cc4m
approved these changes
on 2025-08-02
0cc4m
merged
a9f7541e
into master 98 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub