[Pallas:MGPU] Add support for slicing WGMMA accumulator under WG semantics.
We lower WGMMA accumulator slicing to:
```
acc_slice = vector.extract_strided_slice(acc, ...)
new_acc = mgpu.dialect.wgmma(acc_slice, ...)
acc = vector.insert_strided_slice(new_acc, acc, ...)
```
PiperOrigin-RevId: 903245417