[Pallas:MGPU] Force alignment of SMEM allocations to 1024 bytes
This is to avoid issues when small buffers throw off the alignment for large TMA and WGMMA
operands. We should make this more refined in the future, but this should be enough for now.
PiperOrigin-RevId: 687264994