[Autograd] `expand_as` instead of `clone` to get `AccumulateGrad` (#96356)
This PR makes a minor change to the multi-grad hook implementation. This should decrease peak memory since we avoid one `clone()` per tensor passed into the multi-grad hook. Let me know if there are technical reasons why we need to clone. If so, is there a way for some use cases to not clone?
Before with `clone()`:
![Screenshot 2023-03-08 at 6 08 41 PM](https://user-images.githubusercontent.com/31054793/223873111-ad9105ab-2958-45a1-a2f5-18e9b254c710.png)
After with `expand_as()` -- no more "Memcpy DtoD" kernels:
![Screenshot 2023-03-08 at 6 08 48 PM](https://user-images.githubusercontent.com/31054793/223873104-670b6abc-cd5c-4d1e-a316-cea1bef5832a.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96356
Approved by: https://github.com/soulitzer