[flang][cuda] CUF kernel loop directive (#82836)
This patch introduces a new operation to represent the CUDA Fortran
kernel loop directive. This operation is modeled as a LoopLikeOp
operation in a similar way to acc.loop.
The CUFKernelDoConstruct parse tree node is also placed correctly in the
PFTBuilder to be available in PFT evaluations.
Lowering from the flang parse-tree to MLIR is also done.