[X86] LowerBUILD_VECTORvXi1 - attempt to fold as VPTESTMB(BUILD_VECTOR_vXi8(X),1) (#198166)
i1 scalar elements will be legalised to i8 (and the BUILD_VECTOR relies
on implicit truncation) - but it will often be cheaper to perform the
BUILD_VECTOR as a vXi8 and then perform a comparison to convert to the
vXi1 mask, assuming we're inserting more than one non-constant i1
element.
Without BWI we have to extend this to vXi32 types to perform the
comparison.
There's probably a lot we can do here (v2i8/v4i8/v8i8 types), but this
patch at least addresses the worst codegen cases.
Fixes #179334