[ROCDL] Add dot intrinsics to rocdl (#193129)
This patch adds dot intrinsic support to the rocdl dialect. Having these
(inc. follow up `amdgpu` wrapper) as first class citizens in MLIR will
allow us to lower thread local reductions involving `<=16bit` data more
effectively. This is in line with the spirit of `dot` intrinsic support
wrt existing edge dialects (`x86`, `nvvm`, `spirv`).
Assisted by: Claude
---------
Signed-off-by: Eric Feng <Eric.Feng@amd.com>