[rpc] Allow profiling in RPC to work with torchscript function invocations (#36275)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36275
Calling a TorchScript function from within RPC was added after initial
support for the profiler with RPC, hence, we were not recording torchscript
funtions invoked under RPC correctly. This diff passes the `RecordFunction` to
the `_invoke_torchscript..` calls similar to what is done for builtin and UDFs.
However, this is only a temporary solution. We will be removing the use of
`RecordFunction` as a standalone in the RPC code in
https://github.com/pytorch/pytorch/pull/35055. This diff is to unblock
recording of torchscript functions in the meantime.
ghstack-source-id: 101800134
Test Plan:
Added tests for calling a script function with builtin, sync, and
asyc. The output looks like below:
```
------ --------------- --------------- --------------- --------------- ---------------
> Name Self CPU
total % Self CPU total CPU total % CPU total CPU time avg Number of Calls
> ---------------------------------------------------------------------------------------------------------- ---------
------ --------------- --------------- --------------- --------------- ---------------
> rpc_sync#__torch__.torch.testing._internal.distributed.rpc.rpc_test.my_script_func(worker1 -> worker2) 99.92%
1.056s 99.92% 1.056s 1.056s 1
> select 0.04%
383.661us 0.04% 383.661us 95.915us 4
> fill_ 0.02%
210.966us 0.02% 210.966us 52.741us 4
> to 0.00%
26.276us 0.00% 26.276us 26.276us 1
> empty 0.02%
159.802us 0.02% 159.802us 79.901us 2
> set_ 0.01%
93.818us 0.01% 93.818us 93.818us 1
> ---------------------------------------------------------------------------------------------------------- ---------
------ --------------- --------------- --------------- --------------- ---------------
> Self CPU time total: 1.057s
```
Note that we use `torch.jit._qualified_name` to get the name of the script fn.
Differential Revision: D20930453
fbshipit-source-id: c6d940aa44fcd9dd8a1a29c156aa19e0d8428d60