[MLIR][NVVM] Add LLVMIR lowering for nvvm.subf (#184968)
This change adds direct LLVMIR lowering to the `nvvm.subf` operation
added in #179162 to prevent translation failures when canonicalization
is not run. Also adds `mlir-translate` tests for `nvvm.subf`.
PTX ISA Reference:
1.
https://docs.nvidia.com/cuda/parallel-thread-execution/#floating-point-instructions-sub
2.
https://docs.nvidia.com/cuda/parallel-thread-execution/#half-precision-floating-point-instructions-sub