[flang][cuda] Handle implicit global in cuf kernel and nested statement (#116846)
Update the implicit global detection by looking for them in the CUF
kernel and also update to a walk so nested `fir.address_of` in nested
statement are also accounted for.