expand functional map for reduced floating points on CPU (#104584)
Return output in accumulated dtype in vec::reduce functions when input is float16 or bfloat16, to reduce rounding error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104584
Approved by: https://github.com/jgong5, https://github.com/peterbell10
ghstack dependencies: #104583