onnxruntime
7625a5e3 - [WebGPU-EP] Optimize subgroup_matrix_matmul_nbits on Intel (#25140)

Commit
177 days ago
[WebGPU-EP] Optimize subgroup_matrix_matmul_nbits on Intel (#25140) This PR optimizes the Intel path for subgroup_matrix_matmul_nbits by removing the per-thread load of matrix A and instead using subgroupMatrixLoad directly from global memory, reducing SLM usage and bandwidth pressure. - Removed var<workgroup> tile_A and the loadSHMA helper function. - Updated inner loop to compute a global offset and call subgroupMatrixLoad on input_a. - Adjusted indexing and stride parameters to match the global layout.
Author
Parents
Loading