llvm-project
76f88063 - [AMDGPU] Remove AMDGPUISD::FFBH_I32 and add ISD::CTLS lowering (#187694)

Commit
32 days ago
[AMDGPU] Remove AMDGPUISD::FFBH_I32 and add ISD::CTLS lowering (#187694) It's the a continuation of previously reverted https://github.com/llvm/llvm-project/pull/178420 The patch removes custom AMDGPUISD::FFBH_I32 SelectionDAG node. Call sites that need raw hardware semantics (LowerINT_TO_FP32, legalizeITOFP) now use amdgcn_sffbh intrinsic directly. ISD::CTLS is added as a Custom operation for i32. Previous attempt had an issue: The hardware v_ffbh_i32 instruction (v_cls_i32 on newer targets) has different semantics than ISD::CTLS: -sffbh returns [1, BitWidth-1] for normal values, -1 for all-same-bits -CTLS returns [0, BitWidth-2] for normal values, BitWidth-1 for all-same-bits Now LowerCTLS handles this by: sffbh -> umin(sffbh, BitWidth) -> sub 1. Current patch also adds DAG combine to recognize the common CTLS idiom: sub(ctlz(xor(x, sra(x, BitWidth-1))), 1) -> ctls(x) and an optimization in performMinMaxCombine to fold away umin when the input is not all-same-bits. Partially addresses #177635
Author
Parents
Loading