[NNAPI EP] Add some support for MatMul with batch inputs (#12261)
MatMul allows multiplying batches of matrices. This change enables limited support of batch inputs in the NNAPI EP.
Some limitations:
- Broadcasting is not supported. A and B must have the same leading dimensions.
- Only float inputs are supported. QDQ MatMul or QLinearMatMul with batch inputs is not supported yet.
Note that NNAPI's ANEURALNETWORKS_BATCH_MATMUL is pretty much what we need, but it is only available from NNAPI feature level 6. This change composes a bunch of NNAPI operations to achieve a similar result but this is not ideal.