pytorch
22f4a58a - [pytorch] activation checkpointing: enable mixing tensor without requires_grad (#45934)

Commit View On GitHub

Commit

3 years ago

[pytorch] activation checkpointing: enable mixing tensor without requires_grad (#45934) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45934 https://pytorch.org/docs/stable/checkpoint.html pytorch checkpoint requires all input to the function being checkpointed to requires_grad, but this assumption is not necessarily try. consider the following two examples ``` output = MultiheadedMaskedAtten(input, mask) output = LSTM(input, seq_length) ``` both length and mask are tensors that won't requires grad, currently if you try to checkpoint torch.autograd.backward will complain ``` File "/mnt/xarfuse/uid-124297/7d159c34-seed-nspid4026531836-ns-4026531840/torch/autograd/function.py ", line 87, in apply return self._forward_cls.backward(self, *args) File "/mnt/xarfuse/uid-124297/7d159c34-seed-nspid4026531836-ns-4026531840/torch/utils/checkpoint.py" , line 99, in backward torch.autograd.backward(outputs, args) File "/mnt/xarfuse/uid-124297/7d159c34-seed-nspid4026531836-ns-4026531840/torch/autograd/__init__.py ", line 132, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: element 1 of tensors does not require grad and does not have a grad_fn ``` this diff allows skipping the non-grad-requiring tensor when running autograd.backward. added documentation for this feature as well. Test Plan: added unit test to make sure partial tensor grads can be used in checkpoint(). Differential Revision: D24094764 fbshipit-source-id: 6557e8e74132d5a392526adc7b57b6998609ed12

Author

Weiyi Zheng

Committer

facebook-github-bot

Parents

103b100d

pytorch 22f4a58a - [pytorch] activation checkpointing: enable mixing tensor without requires_grad (#45934)

Commit

pytorch
22f4a58a - [pytorch] activation checkpointing: enable mixing tensor without requires_grad (#45934)