[MLIR][XeGPU] Allow load/store/prefetch uses [memref+offset] instead of tdesc (#150576)
Add variant of load/store/prefetch to allow offset. The new xegpu.load
variant accepts memref+offset, and the existing tdesc operand will be
removed in the future PR.
The semantics are combination of "creating scattered_tdesc + xegpu.load
with scattered_tdesc". The current xegpu.load accepts tdesc operand,
which encapsulates "memref+offset". This PR "fold" "memref+offset"
directly to xegpu.load replacing "tdesc". Create_tdesc will be removed
as scatter_tdesc only contains base address after offsets being taken
away, so there is no point to keep it.
```mlir
// wi level code example
%2 = xegpu.load %src[%offsets], %mask <{chunk_size = 2}> : ui64, vector<1xindex>, vector<1xi1> -> vector<2xf32>
xegpu.store %val, %src[%offsets], %mask: vector<1xf16>, memref<?xf16>, vector<1xindex>, vector<1xi1>
xegpu.prefetch %src[%0] : ui64, vector<1xindex>
```