[NVPTX] Add IR pass for FMA transformation in the llc pipeline (#154735)
This change introduces a new IR pass in the llc pipeline for NVPTX that
transforms sequences of FMUL followed by FADD or FSUB into a single FMA
instruction.
Currently, all FMA folding for NVPTX occurs at the DAGCombine stage,
which is too late for any IR-level passes that might want to optimize or
analyze FMAs. By moving this transformation earlier into the IR phase,
we enable more opportunities for FMA folding, including across basic
blocks.
Additionally, this new pass relies on the contract instruction level
fast-math flag to perform these transformations, rather than depending
on the -fp-contract=fast or -enable-unsafe-fp-math options passed to
llc.