fix default partitioner: save sizes instead of tensor for backward when possible (#91012)
This should fix hf_Longformer, AllenaiLongformerBase, and tacotron2 with dynamic shapes. Example repro:
```
TORCHDYNAMO_DYNAMIC_SHAPES=1 AOT_DYNAMIC_SHAPES=1 python benchmarks/dynamo/torchbench.py --accuracy --backend aot_eager --training --only hf_Longformer
```
used to fail with:
```
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [4, 1024, 12, 513]], which is output 0
of AsStridedBackward0, is at version 6; expected version 4 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient,
with torch.autograd.set_detect_anomaly(True).
```
The problem is that:
(1) when we have a tensor from the forward, whose sizes are needed the backward, we were saving the actual tensor for backward, and directly grabbing the sizes off of it inside of the backward graph (bad for perf)
(2) If that tensor happens to be a graph input that gets mutated, we end up with the above error. Autograd yells at you if you try to save a tensor for backward, and later mutate it.
I confirmed that this problem doesn't happen for the min cut partitioner.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91012
Approved by: https://github.com/ezyang