pytorch
efba6302 - Issue a warning when zero_grad is used in DataParallel (#33064)

Commit
4 years ago
Issue a warning when zero_grad is used in DataParallel (#33064) Summary: Fixes https://github.com/pytorch/pytorch/issues/31768, second attempt of https://github.com/pytorch/pytorch/issues/32870 DataParallel creates replicas of the original `nn.Module` with the parameters duplicated onto the destination devices. Calling `backwards` will propagate gradients onto the original module parameters but calling `zero_grad` on the replica module doesn't clear the gradients from the parent module. However, any replica using backwards was broken anyway since the replica's parameters are not leaf nodes in autograd. So, we should issue a warning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33064 Differential Revision: D19790178 Pulled By: albanD fbshipit-source-id: 886f36640acef4834a6fa57a26ce16b42ff0e9ad
Author
Parents
Loading