[mlir][amdgpu] Add support for multi-dim arith.truncf/extf fp8 lowering (#98074)
The existing `fp8` lowering from `arith` to `amdgpu` bails out on the
multidimensional case. We can handle this by `vector.shape_cast`
collapsing to the 1-D case on extraction and re-casting back to the
desired output shape.