[AArch64] Sanitise pow inputs using a target DAG combine (#192958)
Sometimes we see LLVM IR like this:
%pow = call fast <4 x float> @llvm.pow.v4f32(...)
%fcmp = fcmp fast ...
%res = select <4 x i1> %fcmp, <4 x float> %val, <4 x float> %pow
where the pow intrinsic is called unconditionally, but only certain
lanes of the result are used. In fact, LLVM actively encourages code
like this due to the intrinsic being marked as safe to speculatively
execute. However, we know when using certain vector libraries like
ArmPL that this can be very costly if the unused lanes would take
the pow call down an expensive path. For example, if an input to
pow is a special value (inf, NaN, -0) then it triggers slow special
case handling, and ultimately the result is going to be ignored
anyway. For this reason we prefer to sanitise the pow input to
use 'safe' values when we know the result is going to be discarded.
The above example LLVM IR would then look like
%fcmp = fcmp fast ...
%sel = select <4 x i1>, <4 x float> splat(float 1.0), ...
%pow = call fast <4 x float> @llvm.pow.v4f32(<4 x float> %sel, ...)
%res = select <4 x i1> %fcmp, <4 x float> %val, <4 x float> %pow
where the value 1.0 is chosen due to the fact pow is known to always
return 1.0 for all powers.