[flang][cuda] Lower simple host to device data transfer (#85960)
In CUDA Fortran data transfer can be done via assignment statements
between host and device variables.
This patch introduces a `fir.cuda_data_transfer` operation that
materialized the data transfer between two memory references.
Simple transfer not involving descriptors from host to device are also
lowered in this patch. When the rhs is an expression that required an
evaluation, a temporary is created. The evaluation is done on the host
and then the transfer is initiated.
Implicit transfer when device symbol are present on the rhs is not part
of this patch. Transfer from device to host is not part of this patch.