[SLP] Reject 2-element vectorization when vector inst count exceeds scalar
The LLVM cost model uses integer-valued throughput costs which cannot
represent fractional costs. For 2-element vectors, this rounding can
make vectorization appear profitable when it actually produces more
instructions than the scalar code — the overhead from shuffles, inserts,
extracts, and buildvectors is underestimated.
Add an instruction-count safety check in getTreeCost that estimates
the number of vector instructions (including gathers, shuffles, and
extracts) and compares against the number of scalar instructions.
If the vector code would produce more instructions, reject the tree
regardless of what the cost model says. This catches cases where
fractional cost rounding hides real overhead.
The check is gated behind -slp-inst-count-check (default: on) and
only applies to 2-element root trees where rounding errors matter most.
Reviewers: hiraditya, bababuck, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/190414