[UCC] Add pre & post processing for CPU collectives (#89030)
Summary: The CPU block in `collective_post` was missing pre & post processing. The reduce-scatter implementaion expects use of pre-processing callback to flatten the input tensors, however, the missing invocation meant grabage values were being passed.
Test Plan: Tested the reduce-scatter collective using PARAM
Reviewed By: eastzone
Differential Revision: D41291592
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89030
Approved by: https://github.com/kingchc, https://github.com/kwen2501