[AMDGPU] Remove AMDGPUISD::FFBH_I32 and add ISD::CTLS lowering (#187694)
It's the a continuation of previously reverted
https://github.com/llvm/llvm-project/pull/178420
The patch removes custom AMDGPUISD::FFBH_I32 SelectionDAG node. Call
sites that need raw hardware semantics (LowerINT_TO_FP32, legalizeITOFP)
now use amdgcn_sffbh intrinsic directly. ISD::CTLS is added as a Custom
operation for i32.
Previous attempt had an issue:
The hardware v_ffbh_i32 instruction (v_cls_i32 on newer targets) has
different semantics than ISD::CTLS:
-sffbh returns [1, BitWidth-1] for normal values, -1 for
all-same-bits
-CTLS returns [0, BitWidth-2] for normal values, BitWidth-1 for
all-same-bits
Now LowerCTLS handles this by: sffbh -> umin(sffbh, BitWidth) -> sub 1.
Current patch also adds DAG combine to recognize the common CTLS idiom:
sub(ctlz(xor(x, sra(x, BitWidth-1))), 1) -> ctls(x)
and an optimization in performMinMaxCombine to fold away umin
when the input is not all-same-bits.
Partially addresses #177635