[LSR] Account for hardware loop instructions (#147958)
A hardware loop instruction combines a subtract, compare with zero, and
branch. We currently account for the compare and branch being combined
into one in Cost::RateFormula, as part of more general handling for
compare-branch-zero, but don't account for the subtract, leading to
suboptimal decisions in some cases.
Fix this in Cost::RateRegister by noticing when we have such a subtract
and discounting the AddRecCost in such a case.