[aten] Split some at::launch code into at::internal::launch_no_thread_state() (#38477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38477
A few specific uses (e.g. Thrift rpc parsing) don't need source thread
state to be copied over. In microbenchmarks, this seems to add ~500ns,
so split code across functions, so some code can use directly.
ghstack-source-id: 104190095
Test Plan:
- Existing code using at::launch exercises this codepath, so buck test mode/dev-nosan caffe2/test/...
- For the split version, primarily the Thrift-based change layered on top of this.
Differential Revision: D21573168
fbshipit-source-id: 2ef1f196b5177634d4ee7fdca7371d36906a69d6