onnxruntime
34a0b152 - [webgpu] Support any batch size for dp4a matmul path (#26884)

Commit

180 days ago

[webgpu] Support any batch size for dp4a matmul path (#26884) This pull request adds support for batched matrix multiplication in the DP4A quantized matmul WebGPU kernels and their associated C++ code and tests. The changes update the kernel code, tensor shapes, dispatch logic, and test infrastructure to properly handle a `batch_count` greater than 1, enabling efficient batched execution.

References

#26884 - [webgpu] Support any batch size for dp4a matmul path

Author

qjia7

Parents

5bc10a39

onnxruntime 34a0b152 - [webgpu] Support any batch size for dp4a matmul path (#26884)

onnxruntime
34a0b152 - [webgpu] Support any batch size for dp4a matmul path (#26884)