[SLP] Apply reused-scalar reduction counters at the vectorized lane
The horizontal reduction reuse-counter scale was placed by deduplicated
candidate order, but the emitted reduction vector lane order is defined by
the root node, which may be reordered or split (SplitVectorize). As a
result a repeat count could be applied to the wrong lane, producing a wrong
reduction result. Place each counter at the lane the matching candidate is
vectorized to.
Fixes #206476
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/206611