[flang][runtime] Fix GPU output for multiple statements (#172363)
I recently broke PRINT statements in GPU device code when multiple
PRINTs occur in the same kernel by trying to preserve the allocated
pseudo-unit. This turned out to be a bad idea overall, and I'm reverting
to the original protocol that minimizes allocated memory.