onnxruntime
1ad9f121 - [webgpu] Use 64 as the workgroup size of DP4AMatMulQuantize (#24129)

Commit

132 days ago

[webgpu] Use 64 as the workgroup size of DP4AMatMulQuantize (#24129) Usually, workgroup size 1 is not a good option for compute shader. It means that only one thread is active in one workgroup. This PR uses 64 as the workgroup size of DP4AMatMulQuantize. On Qualcomm Adreno x1-85 GPU: 721.13 ms -> 148.38 ms On NV RTX 2000 Ada: 87.66 ms -> 14.51 ms On Intel Xe GPU: 76.30 ms -> 42.96 ms

References

#24129 - [webgpu] Use 64 as the workgroup size of DP4AMatMulQuantize

Author

qjia7

Parents

850be8e4

onnxruntime 1ad9f121 - [webgpu] Use 64 as the workgroup size of DP4AMatMulQuantize (#24129)

onnxruntime
1ad9f121 - [webgpu] Use 64 as the workgroup size of DP4AMatMulQuantize (#24129)