[Profiler x RPC] Enable RPC Server Global Profiler (#38847)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38847
See motivation and design in https://github.com/pytorch/pytorch/issues/38845.
Close https://github.com/pytorch/pytorch/issues/38845.
Changes,
- Add pre-request and post-response hooks to RPC "request_callback_impl.cpp". For a thread that executes RPC handler, check if the server-side global profiling is on. If it's on, enable profiling on this thread and after response, merge the thread-local profiling result into the global profiling state.
- Add context-style Python API to parse the profiling Events into ranges represented by FunctionEvent.
- Add data-structures to work as global profiling state that support nesting and container for consolidating results from multiple threads.
Test,
- Add a test that uses nested profiling range and inspect the profiling events.
ghstack-source-id: 104991517
Test Plan:
buck test mode/dev-nosan //caffe2/test/distributed/rpc/:rpc_fork
buck build mode/dev-nosan //caffe2/test/distributed/rpc/:rpc_fork && \
buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_server_process_global_profiler
Differential Revision: D5665992
fbshipit-source-id: 07f3bef5efd33d1214ef3404284c3803f5deca26