pytorch
16f4501c - Improve checkpoint docs to warn users about detached gradient issues (#37266)

Commit

4 years ago

Improve checkpoint docs to warn users about detached gradient issues (#37266) Summary: See https://discuss.pytorch.org/t/training-with-gradient-checkpoints-torch-utils-checkpoint-appears-to-reduce-performance-of-model/78102/3?u=jwl for details. Updated the docs to warn users about issues with checkpointing models that use `detach()` or `torch.no_grad()` to freeze their model layers/weights during training. When they do this, training with `checkpoint` will fail as it forces the outputs to require gradients when the model itself does not. Hence, during the backward pass it will output the error: ``` [4]<stderr>:RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn ``` Maybe it is possible to fix this directly in the code, but I am not sure how in the current codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37266 Differential Revision: D21262558 Pulled By: mrshenli fbshipit-source-id: 529cf370534504baf8937ef17dac5d6916fbf5ae

Author

Justin Liang

Committer

facebook-github-bot

Parents

023c3575

pytorch 16f4501c - Improve checkpoint docs to warn users about detached gradient issues (#37266)

pytorch
16f4501c - Improve checkpoint docs to warn users about detached gradient issues (#37266)