llvm-project
b8add371 - [AMDGPU] Add pattern to select scalar ops for fshr with uniform operands (#165295)

Commit
27 days ago
[AMDGPU] Add pattern to select scalar ops for fshr with uniform operands (#165295) Reasoning behind proposed change. This helps us move away from selecting v_alignbits for fshr with uniform operands. V_ALIGNBIT is defined in the ISA as: D0.u32 = 32'U(({ S0.u32, S1.u32 } >> S2.u32[4 : 0]) & 0xffffffffLL) Note: S0 carries the MSBs and S1 carries the LSBs of the value being aligned. I interpret that as : concat (s0, s1) >> S2, and use the 0X1F mask to return the lower 32 bits. fshr: fshr i32 %src0, i32 %src1, i32 %src2 Where: concat(%src0, %src1) represents the 64-bit value formed by %src0 as the high 32 bits and %src1 as the low 32 bits. %src2 is the shift amount. Only the lower 32 bits are returned. So these two are identical. So, I can expand the V_ALIGNBIT through bit manipulation as: Concat: S1 | (S0 << 32) Shift: ((S1 | (S0 << 32)) >> S2) Break the shift: (S1>>S2) | (S0 << (32 – S2) The proposed pattern does exactly this. Additionally, src2 in the fshr pattern should be: * must be 0–31. * If the shift is ≥32, hardware semantics differ; you must handle it with extra instructions. The extra S_ANDs limit the selection only to the last 5 bits
Author
Parents
Loading