pytorch
efba6302 - Issue a warning when zero_grad is used in DataParallel (#33064)

Commit

4 years ago

Issue a warning when zero_grad is used in DataParallel (#33064) Summary: Fixes https://github.com/pytorch/pytorch/issues/31768, second attempt of https://github.com/pytorch/pytorch/issues/32870 DataParallel creates replicas of the original `nn.Module` with the parameters duplicated onto the destination devices. Calling `backwards` will propagate gradients onto the original module parameters but calling `zero_grad` on the replica module doesn't clear the gradients from the parent module. However, any replica using backwards was broken anyway since the replica's parameters are not leaf nodes in autograd. So, we should issue a warning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33064 Differential Revision: D19790178 Pulled By: albanD fbshipit-source-id: 886f36640acef4834a6fa57a26ce16b42ff0e9ad

Author

peterbell10

Committer

facebook-github-bot

Parents

e2f12885

pytorch efba6302 - Issue a warning when zero_grad is used in DataParallel (#33064)

pytorch
efba6302 - Issue a warning when zero_grad is used in DataParallel (#33064)