[MLIR][XeGPU] Extend op definitions to support 3D+: load_nd, store_nd, prefetch_nd (#199811)
**Summary**
Extend xegpu.load_nd, xegpu.store_nd, and xegpu.prefetch_nd operations
to support 3D and higher-dimensional tensor descriptors with batch
dimensions, enabling batched memory operations for workloads like [4, 8,
16] tensor loads/stores.
**Changes**
- Verifiers: Removed rank > 2 checks in LoadNdOp::verify() and
StoreNdOp::verify() to allow 3D+ tensor descriptors
- Documentation: Added comprehensive documentation explaining: Tensor
descriptors can be 1D, 2D, 3D, or higher dimensional; Batch dimensions
(leading dimensions) are unrolled to unit dimensions during lowering;
Operations execute at 2D granularity at subgroup level to match 2D block
IO hardware; Examples of 3D operations
- Tests: Added unit tests for 3D operations (load_nd_3d, store_nd_3d,
prefetch_nd_3d)
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>