[WebGPU EP] Add EINSUM implementation (#24358)
### Description
<!-- Describe your changes. -->
This PR added the native implementation of einsum operator, based and
expanded on existing einsum.ts. All the test cases in einsum_test.cc
have been passed.
The equation attribute value of einsum op is a string which consists of
left hand side (LHS) and optionally right hand side (RHS) separated by
'->'. Ex.
- "ij->ji" matrix transpose
- "ii->i" diagonal elements of a square matrix
- "ij->" sum over all elements of a matrix
- "ij,jk->ik" explicit matrix multiplication
- "ij,jk" implicit matrix multiplication
- "ij,jk->" matrix multiplication and sum over all elements
- "ij,jk,kl->il" three matrix multiplication
- "...ij,...jk->...ik" batched matmul with broadcasting
- ",...i->...i" matrix element multiplication with one scalar
- "abc,cd->abc" keep the original abc matrix shape but matmul and sum
over along d
LHS consists of a sequence of terms separated by commas. Each term
corresponds to an input variable.
Each symbol corresponds to a dimension in the input variable. The symbol
can be either a letter, 'a' to 'z' or 'A' to
'Z' or '...' to represent arbitrary dimensions or empty to represent a
scalar.
Empty RHS are handleed differently for implicit vs explicit modes.
- Implicit mode - arrow is not in the equation where the equation
"ij,jk" equals to "ij,jk->ik" which is actually a matrix multiplication.
- Explicit mode - arrow is in the equation where the equation "ij,jk->"
contains two steps, first step is a matrix multiplication just like the
implicit mode, and the second step is to sum up the matrix produced by
the first step to a scalar.
For all the test cases, pls refer to einsum_test.cc