BF16_Optimizer: add support for bf16 grad acc (#4713)
the default accumulation data type is fp32
by adding the below to deepspeed json file:
"data_types" : {"grad_accum_dtype": "bf16"}
gradient accumulation will be performed in bf16.
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>