DeepSpeed
7fcc8911 - Fix zero stage2 cpu_offload when some model trainable parameters skipped in training (#861)

Commit
4 years ago
Fix zero stage2 cpu_offload when some model trainable parameters skipped in training (#861) * Fix zero stage2 cpu_offload when some model trainable parameters skipped in training, as in https://github.com/microsoft/DeepSpeed/issues/707 As some model trainable parameters skipped in training, their backward hooks in self.create_reduce_and_remove_grad_hooks() will not run, so they have no norm_for_param_grads * Trim space * Trim space Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Parents
Loading