llvm-project
2c39855f - [AArch64] Sanitise pow inputs using a target DAG combine (#192958)

Commit
1 day ago
[AArch64] Sanitise pow inputs using a target DAG combine (#192958) Sometimes we see LLVM IR like this: %pow = call fast <4 x float> @llvm.pow.v4f32(...) %fcmp = fcmp fast ... %res = select <4 x i1> %fcmp, <4 x float> %val, <4 x float> %pow where the pow intrinsic is called unconditionally, but only certain lanes of the result are used. In fact, LLVM actively encourages code like this due to the intrinsic being marked as safe to speculatively execute. However, we know when using certain vector libraries like ArmPL that this can be very costly if the unused lanes would take the pow call down an expensive path. For example, if an input to pow is a special value (inf, NaN, -0) then it triggers slow special case handling, and ultimately the result is going to be ignored anyway. For this reason we prefer to sanitise the pow input to use 'safe' values when we know the result is going to be discarded. The above example LLVM IR would then look like %fcmp = fcmp fast ... %sel = select <4 x i1>, <4 x float> splat(float 1.0), ... %pow = call fast <4 x float> @llvm.pow.v4f32(<4 x float> %sel, ...) %res = select <4 x i1> %fcmp, <4 x float> %val, <4 x float> %pow where the value 1.0 is chosen due to the fact pow is known to always return 1.0 for all powers.
Author
Parents
Loading