onnxruntime
7625a5e3 - [WebGPU-EP] Optimize subgroup_matrix_matmul_nbits on Intel (#25140)

Commit

249 days ago

[WebGPU-EP] Optimize subgroup_matrix_matmul_nbits on Intel (#25140) This PR optimizes the Intel path for subgroup_matrix_matmul_nbits by removing the per-thread load of matrix A and instead using subgroupMatrixLoad directly from global memory, reducing SLM usage and bandwidth pressure. - Removed var<workgroup> tile_A and the loadSHMA helper function. - Updated inner loop to compute a global offset and call subgroupMatrixLoad on input_a. - Adjusted indexing and stride parameters to match the global layout.

References

#25140 - [WebGPU-EP] Optimize subgroup_matrix_matmul_nbits on Intel

Author

jchen10

Parents

293a5ac5

onnxruntime 7625a5e3 - [WebGPU-EP] Optimize subgroup_matrix_matmul_nbits on Intel (#25140)

onnxruntime
7625a5e3 - [WebGPU-EP] Optimize subgroup_matrix_matmul_nbits on Intel (#25140)