[AMDGPU] Optimize S_OR_B32 to S_ADDK_I32 where possible (#177949)
This PR fixes #177753, converting disjoint S_OR_B32 to S_ADDK_I32
whenever possible, it avoids this transformation in case S_OR_B32 can be
converted to bitset.
Note on Test Failures (Draft Status) This change causes significant
register reshuffling across the test suite due to the new allocation
hints and the swaps performed in case src0 is not a register and src1,
along with the change from or to addk. To avoid a massive, noisy diff
during the initial logic review:
This Draft PR only includes a representative sample of updated tests.
CodeGen/AMDGPU/combine-reg-or-const.ll -> Showcases change from S_OR to
S_ADDK
CodeGen/AMDGPU/s-barrier.ll -> Showcases swap between Src0 and Src1 if
src0 is not a register
The rest of the tests show the result of the register allocation hint we
give, I have checked every test I updated and they seem ok to me.
Once the core logic is approved, I will run the update script across the
remaining ~70 failing tests and mark the PR as "Ready for Review."