[RFC] Profile rpc_async call from JIT (#40652)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40652
Resolves https://github.com/pytorch/pytorch/issues/40304, but looking for
feedback on whether there is a better approach for this.
In order to profile `rpc_async` calls made within a torchscript function, we
add the profiling logic to `rpcTorchscript` which is the point where the RPC is
dispatched and is called by the jit `rpc_async` operator. We take a somewhat
similar approach to how this is done in the python API. If profiling is
enabled, we call `record_function_enter` which creates a `RecordFunction`
object and runs its starting callbacks. Then, we schedule end callbacks for
this `RecordFunction` to be run when the jit future completes.
One caveat is that `rpcTorchscript` can also be called by rpc_async from a
non-JIT function, in which case the profiling logic lives in Python. We add a
check to ensure that we don't double profile in this case.
ghstack-source-id: 107109485
Test Plan: Added relevant unittests.
Differential Revision: D22270608
fbshipit-source-id: 9f62d1a2a27f9e05772d0bfba47842229f0c24e1