[flang][cuda] Generate cuf.allocate for descriptor with CUDA components (#152041)
The descriptor for derived-type with CUDA components are allocated in
managed memory. The lowering was calling the standard runtime on
allocate statement where it should be a `cuf.allocate` operation.