[MLIR][NVVM] Fix the lowering of mbarrier.test.wait (#166555)
PR #165993 accidentally broke the lowering of the `test.wait` Op.
This patch fixes the issue and adds tests to verify the lowering to
intrinsics for all mbarrier Ops, ensuring similar regressions are caught in the
future.
Additionally, the `cp-async-mbarrier` test is moved to the
`mbarriers.mlir` test file to keep all related tests together.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>