[AMDGPU] Add pattern to select scalar ops for fshr with uniform operands (#165295)
Reasoning behind proposed change. This helps us move away from selecting
v_alignbits for fshr with uniform operands.
V_ALIGNBIT is defined in the ISA as:
D0.u32 = 32'U(({ S0.u32, S1.u32 } >> S2.u32[4 : 0]) & 0xffffffffLL)
Note: S0 carries the MSBs and S1 carries the LSBs of the value being
aligned.
I interpret that as : concat (s0, s1) >> S2, and use the 0X1F mask to
return the lower 32 bits.
fshr:
fshr i32 %src0, i32 %src1, i32 %src2
Where:
concat(%src0, %src1) represents the 64-bit value formed by %src0 as the
high 32 bits and %src1 as the low 32 bits.
%src2 is the shift amount.
Only the lower 32 bits are returned.
So these two are identical.
So, I can expand the V_ALIGNBIT through bit manipulation as:
Concat: S1 | (S0 << 32)
Shift: ((S1 | (S0 << 32)) >> S2)
Break the shift: (S1>>S2) | (S0 << (32 – S2)
The proposed pattern does exactly this.
Additionally, src2 in the fshr pattern should be:
* must be 0–31.
* If the shift is ≥32, hardware semantics differ; you must handle it
with extra instructions.
The extra S_ANDs limit the selection only to the last 5 bits