[ROCDL] Added `rocdl.cvt.scale.pk8` ops (#161411)
This patch introduces some missing FP conversion instructions in the
ROCDL dialect
Specifically:
- Downscaling 8x packed F16, Bf16, Fp32 values to Fp8, Bf8, Fp4
Tests:
- Added lit-tests to check MLIR -> LLVM lowering