[PyTorch] AOTI: add CPU fast path in aoti_torch_empty_strided (#110877)
This seems to reduce benchmark time by 15-20%. Supersedes D49835545.
Differential Revision: [D49974460](https://our.internmc.facebook.com/intern/diff/D49974460/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110877
Approved by: https://github.com/chenyang78, https://github.com/jansel, https://github.com/desertfire
ghstack dependencies: #110876