Fix zero stage2 cpu_offload when some model trainable parameters skipped in training (#861)
* Fix zero stage2 cpu_offload when some model trainable parameters skipped in training, as in https://github.com/microsoft/DeepSpeed/issues/707
As some model trainable parameters skipped in training,
their backward hooks in self.create_reduce_and_remove_grad_hooks() will not run,
so they have no norm_for_param_grads
* Trim space
* Trim space
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>