[MLIR][NVVM][NVPTX] Support for new mma/mma.sp variants from PTX 9.1 (#182325)
This change adds support for `.scale_vec::4X` with `.ue8m0` as `.stype`
with `.kind::mxf4nvf4` for `mma/mma.sp` instructions introduced in [PTX
ISA
9.1](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html?highlight=mma%2520sp#ptx-isa-version-9-1).
Also, it updates MLIR mma/mma.sp block scale tests with struct usage
instead of vector.