Fix AOTAutograd 2.0 perf regression involving as_strided (#92255)
I feel there may be a deeper fix where we avoid as_strided entirely, but in the regressed model the sizes/strides all lined up exactly, so this seems to work to fix the immediate regression.
Repro command: `python benchmarks/dynamo/torchbench.py --performance --backend inductor --float16 --training --batch-size-file $(realpath benchmarks/dynamo/torchbench_models_list.txt) --only hf_Bert `
Before: 1.138x p=0.00
After: 1.162x p=0.00
Natalia pinpointed it to this line by comparing GPU traces and finding that the regressed PyTorch had two extra fill kernels and a memcpy:
Without regression:
![image](https://user-images.githubusercontent.com/13564/212726521-450e183d-7b36-4538-ad14-617e09c689a8.png)
With regression:
![image](https://user-images.githubusercontent.com/13564/212726469-4f3ff4b5-3f68-48cf-94d2-ddebb9216176.png)
...which CPU profiler blamed on `AsStridedBackward`:
![image](https://user-images.githubusercontent.com/13564/212726953-16333bfc-8460-4445-90ad-7fe73c4173c2.png)
...which were then pinpointed to https://github.com/pytorch/pytorch/pull/92076/files#diff-df954bbf954d2dcb81f687876053267ffa4ddb36ed86b7d2bd76319ff2b94416R486-R489
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92255
Approved by: https://github.com/ngimel, https://github.com/bdhirsh