onnxruntime
1ad9f121 - [webgpu] Use 64 as the workgroup size of DP4AMatMulQuantize (#24129)

Commit
132 days ago
[webgpu] Use 64 as the workgroup size of DP4AMatMulQuantize (#24129) Usually, workgroup size 1 is not a good option for compute shader. It means that only one thread is active in one workgroup. This PR uses 64 as the workgroup size of DP4AMatMulQuantize. On Qualcomm Adreno x1-85 GPU: 721.13 ms -> 148.38 ms On NV RTX 2000 Ada: 87.66 ms -> 14.51 ms On Intel Xe GPU: 76.30 ms -> 42.96 ms
Author
Parents
Loading