estimate_zero2_model_states_mem_needs: fixing memory estiamtion (#5099)
was considering 4 bytes per model param, and 4 bytes per gradient.
fixed it to 2 bytes - under the assumption of FP16/BF16
---------
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>