[MLIR][XeGPU][VectorToXeGPU] Fix transfer_read/write cases with non-contiguous memrefs (#158126)
This PR fixes a case where a source memref in
`vector.transfer_read/write` is not contiguous, which violates the
`memref.collapse_shape` semantic that is used in the lowering.
<details><summary>An example of a failing test</summary>
```mlir
gpu.module @xevm_module {
gpu.func @load_from_subview(%source: memref<4096x4096xf16>, %off1: index, %off2: index) -> vector<8xf16> {
%c0 = arith.constant 0.0 : f16
%subview = memref.subview %source[%off1, %off2] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1], offset: ?>>
%0 = vector.transfer_read %subview[%off2, %off2], %c0
{in_bounds = [true]} : memref<256x256xf16, strided<[4096, 1], offset: ?>>, vector<8xf16>
gpu.return %0 : vector<8xf16>
}
}
```
Fails with:
```
/home/user/llvm/mlir/test/Conversion/VectorToXeGPU/transfer-read-to-xegpu.mlir:404:8: error: 'memref.collapse_shape' op invalid source layout map or collapsing non-contiguous dims
%0 = vector.transfer_read %subview[%off2, %off2], %c0
^
/home/user/llvm/mlir/test/Conversion/VectorToXeGPU/transfer-read-to-xegpu.mlir:404:8: note: see current operation: %8 = "memref.collapse_shape"(%2) <{reassociation = [[0, 1]]}> : (memref<256x256xf16, strided<[4096, 1], offset: ?>>) -> memref<65536xf16>
```
</details>
A suggestion was to replace `memref.collapse_shape` with
`memref.extract_aligned_pointer_as_index` which is done in this PR.
Since `extract_aligned_pointer` applied to a subview returns an original
pointer without subview offsets, this PR also adds a logic to use an
offset obtained from `memref.extract_strided_metadata` in `baseOffset`
calculation in `computeOffsets`.
---------
Signed-off-by: dchigarev <dmitry.chigarev@intel.com>