[PyTorch] AOTI: generate reused thread_locals when tensors provably have static shape (#110892)
If a Tensor can be reused and has static shape, we can just cache it across iterations.
This is meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning.
Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110892
Approved by: https://github.com/bertmaher
ghstack dependencies: #110876, #110877, #110909