[MLIR][NVVM] Add Op for TMA Store with reduction (#118853)
PR #116854 adds intrinsics for TMA Store with reduction.
This patch adds an NVVM Dialect Op for the same.
* Lit tests are added to verify the lowering to LLVM intrinsics and
invalid cases.
* The common verifier method is updated to handle im2col modes without
offsets.
This helps Ops like TMA Store, TMA StoreReduce etc.
* The nvvmir.mlir test file is already large. So, this patch adds the
tests for this Op
in a new file under a separate "nvvm/" directory.
[mlir/test/Target/LLVMIR/"nvvm"/tma_store_reduce.mlir]
PTX Spec reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-reduce-async-bulk-tensor
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>