[AMDGPU] LRO: allow same-BB non-lookthrough users for PHI (#160909)
Loop headers frequently consume the loop-carried value in the header
block via non-lookthrough ops (e.g. byte-wise vector binops).
LiveRegOptimizer’s same-BB filter currently prunes these users, so the
loop-carried PHI is not coerced to i32 and the intended packed form is
lost.
Relax the filter: when the def is a PHI, allow same-BB non-lookthrough
users. Also fix the check to look at the user (CII) rather than the def
(II) so the walk does not terminate prematurely.