[AMDGPU] LiveRegOptimizer: consider i8/i16 binops on SDWA (#155800)
PHI-node part was merged with PR#160909.
Extend `isOpLegal` to treat 8/16-bit vector add/sub/and/or/xor as
profitable on SDWA targets (stores and intrinsics remain profitable).
This repacks loop-carried values to i32 across BBs and restores SDWA
lowering instead of scattered lshr/lshl/or sequences.
Testing:
- Local: `check-llvm-codegen-amdgpu` is green (4314/4320 passed, 6
XFAIL).
- Additional: validated in AMD internal CI