[flang][cuda] Place box value kernel args in managed memory (#197116)
Example:
```fortran
type deviceArray
integer, allocatable, dimension(:,:), device :: Arr
end type deviceArray
type(deviceArray), allocatable, dimension(:) :: DA
allocate(DA(2))
allocate(DA(1)%Arr(32,32))
call mykernel<<<1,32>>>(DA(1)%Arr, 32) ! cudaErrorIllegalAddress
```
In this code, `DA(1)%Arr` is a device allocatable component inside a
managed derived type. The compiler loads the descriptor, reboxes it on
the host stack, and passes it to the kernel. Since `!fir.box` is lowered
to a pointer in LLVM IR, the kernel receives a host-stack pointer it
cannot dereference — causing `cudaErrorIllegalAddress`.
The existing `isDeviceAllocation` check is def-based: it traces the
value's origin to decide whether to use managed memory. It works for
global device descriptors and `cuf.data_attr`-annotated arguments, but
not for this case — the rebox input is a plain host alloca, and only the
*use* as a kernel argument requires managed memory.
**Fix:** Add a use-based `isUsedByGPULaunchFunc` check alongside
`isDeviceAllocation` in the embox, rebox, and load conversions in
CodeGen. When a box descriptor is passed directly to `gpu.launch_func`,
its storage is allocated in managed memory via
`_FortranACUFAllocDescriptor` instead of a stack alloca.