onnxruntime
8f8069dc - [webgpu] Optimize Conv by im2col-matmul (#26603)

Commit

28 days ago

[webgpu] Optimize Conv by im2col-matmul (#26603) ### Description This PR optimizes the `Conv` operation by implementing two new compute shaders: `oihw_to_ohwi` and `im2col-matmul`. `oihw_to_ohwi`: Improves performance over the default Transpose shader by utilizing workgroup memory to ensure continuous memory read/write patterns. `im2col-matmul`: - Employs a workgroup size of 64. - Dynamically selects tile sizes (32x64 or 16x64) based on the source/weight shape. - Each invocation handles a dedicated weight element. - Uses subgroupShuffle to efficiently access the source tile, leveraging k_vec4 vectorization for better memory throughput. Testing on Lunar Lake demonstrated **up to an 87%** performance improvement in Conv_2D operations. ### Motivation and Context See above.

References

#26603 - [webgpu] Optimize Conv by im2col-matmul

Author

daijh

Parents

817a44fc

onnxruntime 8f8069dc - [webgpu] Optimize Conv by im2col-matmul (#26603)

onnxruntime
8f8069dc - [webgpu] Optimize Conv by im2col-matmul (#26603)