[NVPTX][NFC] Rearrange the TMA-S2G intrinsics (#144903)
This patch moves the TMA S2G intrinsics into their own set of loops.
This is in preparation for adding im2colw/w128 modes support to
the G2S intrinsics (but the S2G ones do not support those modes).
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>