[Mosaic GPU] Annotate alignment of GMEM pointers to be 256
This is the default of cudaMalloc and is also upheld by the XLA:GPU runtime.
Without the annotation, LLVM would sometimes conservatively avoid emitting
vectorized loads and stores to GMEM.
PiperOrigin-RevId: 796845444