[MLIR][NVVM] Add Permute Op (#169793)
This patch adds the `permute` op.
Lit tests are added to verify the lowering to the intrinsics.
Negative tests are also added to check the error-handling of invalid
combinations.
PTX spec reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-prmt
Signed-off-by: Dharuni R Acharya <dharunira@nvidia.com>