llvm-project
5c80731f - [flang][cuda] Place box value kernel args in managed memory (#197116)

Commit
13 hours ago
[flang][cuda] Place box value kernel args in managed memory (#197116) Example: ```fortran type deviceArray integer, allocatable, dimension(:,:), device :: Arr end type deviceArray type(deviceArray), allocatable, dimension(:) :: DA allocate(DA(2)) allocate(DA(1)%Arr(32,32)) call mykernel<<<1,32>>>(DA(1)%Arr, 32) ! cudaErrorIllegalAddress ``` In this code, `DA(1)%Arr` is a device allocatable component inside a managed derived type. The compiler loads the descriptor, reboxes it on the host stack, and passes it to the kernel. Since `!fir.box` is lowered to a pointer in LLVM IR, the kernel receives a host-stack pointer it cannot dereference — causing `cudaErrorIllegalAddress`. The existing `isDeviceAllocation` check is def-based: it traces the value's origin to decide whether to use managed memory. It works for global device descriptors and `cuf.data_attr`-annotated arguments, but not for this case — the rebox input is a plain host alloca, and only the *use* as a kernel argument requires managed memory. **Fix:** Add a use-based `isUsedByGPULaunchFunc` check alongside `isDeviceAllocation` in the embox, rebox, and load conversions in CodeGen. When a box descriptor is passed directly to `gpu.launch_func`, its storage is allocated in managed memory via `_FortranACUFAllocDescriptor` instead of a stack alloca.
Author
Parents
Loading