onnxruntime
81fc3f1f - [WebGPU] Optimize GEMM with vec4 (#24478)

Commit
306 days ago
[WebGPU] Optimize GEMM with vec4 (#24478) ### Description <!-- Describe your changes. --> In this PR, we use vec4 to optimize GEMM when colums of A and B can be divided by 4, or use previous shader. I will add u32/vec2 implementation in the future, and we will only keep one shader at that time. ### Perf comparison I run customized model only include GEMM(M = N = K = 1024) with nodejs on M2/M3 Max. Roughly 20% increase. || !transA&&!transB | transA | transB | transA&&transB | |------------------|------------|------------|----------------|------------| | M2 | 9.36->7.41 | 9.45->7.54 | 11.21->8.19 | 9.66->8.37 | | M3 max | 8.07->6.99 | 7.54->6.53 | 8.42->5.89 | 5.47->5.29 |
Author
Parents
Loading