pytorch
55ca6901 - [CheckpointWrapper] Decouple CPU offload (#84907)

Commit View On GitHub

Commit

2 years ago

[CheckpointWrapper] Decouple CPU offload (#84907) This fixes the activation offload for checkpoint wrapper, which was previously broken. It was broken because it was tightly coupled with activation checkpoint, i.e. we did: ``` with save_on_cpu: checkpoint(module_forward()) ``` which would not offload any activation tensors to CPU, as those activations would already be not saved by autograd due to the checkpoint implementation taking priority. Now, if `offload_to_cpu` is specified, we only do `save_on_cpu` and no checkpoint, so all intermediate tensors are offloaded to CPU instead of checkpointed. These wrappers can be composed, i.e. if we have `(Linear, Linear) -> (Linear, Linear) -> (Linear, Linear)` we can do `Offload( checkpoint(Linear, Linear) -> checkpoint(Linear, Linear) -> checkpoint(Linear, Linear))` and inner tensors would be checkpointed while outers will be offloaded. Differential Revision: [D39448882](https://our.internmc.facebook.com/intern/diff/D39448882/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84907 Approved by: https://github.com/awgu

Author

rohan-varma

Committer

pytorchmergebot

Parents

166ea7e6

pytorch 55ca6901 - [CheckpointWrapper] Decouple CPU offload (#84907)

Commit

pytorch
55ca6901 - [CheckpointWrapper] Decouple CPU offload (#84907)