[mlir][ArmSME] Audit ArmSME load/store ops (#139573)
This patch updates the following ArmSME ops to require that input and
output element types match:
* `arm_sme.tile_load`, `arm_sme.tile_store`,
`arm_sme.tile_load_slice`, `arm_sme.tile_store_slice`.
In addition, it ensures that the base memref operand for `tile_load` and
`tile_store` is always rank-2, aligning with the semantics of Arm SME
tiles (always rank-2). This change is effectively a follow-up to
#135151:
* "[mlir][vector] Tighten the semantics of vector.{load|store}"
The patch also updates `createLoadStoreForOverTileSlices` in
ArmSMEToSCF.cpp to fail when processing invalid tile stores like the
following:
```mlir
arm_sme.tile_store %arg0, %arg1[%c0] : memref<?x4xi8>, vector<[4]x[4]xi32>
```
This particular change fixes #118769. As noted in the TODO, we should
further extend op verification logic — I plan to address that in a
follow-up patch.