[Inductor] Update the cpp_wrapper entry function signature (#121745)
Summary: Update the entry function to use AtenTensorHandle instead of at::Tensor. This makes the compilation of the generated cpp wrapper code much faster: test_cpu_cpp_wrapper.py from 35 min to 21 min, and test_cuda_cpp_wrapper.py from 21 min to 14 min.
Differential Revision: [D54818715](https://our.internmc.facebook.com/intern/diff/D54818715)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121745
Approved by: https://github.com/chenyang78, https://github.com/jansel
ghstack dependencies: #121523, #121743, #121744