[libclc] Simplify unary_def_scalarize.inc's use in __clc_erf/erfc/tgamma (#150181)
Also delete unary_def_via_fp32.inc. There are small changes in
amdgcn--amdhsa.bc due to vector conversion is scalarized, e.g.
%2 = fpext <4 x half> %0 to <4 x float>
%3 = extractelement <4 x float> %2, i64 0
%4 = tail call float @llvm.fabs.f32(float %3)
->
%2 = extractelement <4 x half> %0, i64 0
%3 = tail call half @llvm.fabs.f16(half %2)
%4 = fpext half %3 to float