abstract vectorized reduction utils on CPU (#92284)
This PR abstracts some reduction utils on CPU, which can be shared by multiple reduction operators, such as `scatter_reduce`, `segment_reduce`, `spmm_reduce`.
No functional change or performance change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92284
Approved by: https://github.com/ezyang