[flang] Postpone hlfir.end_associate generation for calls. (#138786)
If we generate hlfir.end_associate at the end of the statement,
we get easier optimizable HLFIR, because there are no compiler
generated operations with side-effects in between the call
and the consumers. This allows more hlfir.eval_in_mem to reuse
the LHS instead of allocating temporary buffer.
I do not think the same can be done for hlfir.copy_out always, e.g.:
```
subroutine test2(x)
interface
function array_func2(x,y)
real:: x(*), array_func2(10), y
end function array_func2
end interface
real :: x(:)
x = array_func2(x, 1.0)
end subroutine test2
```
If we postpone the copy-out until after the assignment, then
the result may be wrong.