[mlir][ROCDL] Add fp4 and fp6 conversion intrinsics, fix fp8 immargs (#140801)
This PR adds support for the scaled conversion intrinsics for fp4 and
fp6 types so that they can be targetted by a future amdgpu dialect op or
used directly.
Additionally, this patch refactors the copy-paste-heavy fp8 versions of
these scaled conversion intrinsics with tablegen `foreach` loops, and
fixes the fact that certain immargs weren't being stored as attributes.
Note that some of the MLIR-level tests for those scaled fp8 intrinsics
had incorrect return types, which have been fixed.
(Note that while the operations have a known return type, the IR format
still prints that type for clarity).