[mlir][amdgpu] Add `rocdl.s.waitcnt` wrapper (#149670)
The main motivations is to pass vmcnt/expcnt/lgkmcnt values directly
(similar to the asm format) and delegate architecture-dependent
bitpacking to the amdgpu->rocdl lowering.
---------
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>