onnxruntime
34a0b152 - [webgpu] Support any batch size for dp4a matmul path (#26884)

Commit
32 days ago
[webgpu] Support any batch size for dp4a matmul path (#26884) This pull request adds support for batched matrix multiplication in the DP4A quantized matmul WebGPU kernels and their associated C++ code and tests. The changes update the kernel code, tensor shapes, dispatch logic, and test infrastructure to properly handle a `batch_count` greater than 1, enabling efficient batched execution.
Author
Parents
Loading