Add context manager to save tensors on CPU (#61928)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61928
Fix #57100.
Creates a function `torch.autograd.graph.set_save_on_cpu_hooks()` which can be used to register default hooks under which all tensors saved during the forward pass are actually copied* to cpu, then copied back to the appropriate device for the backward pass.
*If the tensor was already on cpu, the entire operation is a no op.
If the tensor is on GPU, we copy the tensor to `pin_memory` during packing so that the unpacking can be done asynchronously.
See [benchmark](https://github.com/pytorch/pytorch/pull/61928#issuecomment-885089279) and [note about training large models](https://github.com/pytorch/pytorch/pull/61928#issuecomment-887009448)
Test Plan: Imported from OSS
Reviewed By: soulitzer
Differential Revision: D29848526
Pulled By: Varal7
fbshipit-source-id: 3d289cddd4fa377bd4884ba0d569fa47c777d9e5