[CPU] add fp16 support to shm inference_all_reduce (#5669)
This PR adds FP16 support to DeepSpeed SHM inference_all_reduce.
Previously only FP32 and BF16 is supported. This is to align with
PyTorch CPU support on FP16 datatype.
---------
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>