[ROCDL] Added rocdl.cvt.scale.sr.pk8 ops (#162244)
This patch introduces some missing FP conversion instructions in the
ROCDL dialect for the GFX1250 arch.
Specifically:
Downscaling 8x packed F16, Bf16, Fp32 values to Fp8, Bf8, Fp4 with
stochastic rounding
Tests:
Added lit-tests to check MLIR -> LLVM lowering