Fix deadlock in ExecutionTraceObserver (#119242)
Summary:
With the compiled PyTorch module, in execution_trace_observer.cpp, function convertIValue calls TensorImpl->storage_offset(). That function call will trigger a recursive call into recordOperatorStart. It will cause a deadlock on ob.g_mutex.
This DIFF is to fix this deadlock by replacing std::mutex with std::recursive_mutex.
Since PyTorch only has one thread for FWD, and one thread for BWD. The contention is very low, the performance should NOT be a concern.
Test Plan:
Unit Test
buck test mode/dev-nosan caffe2/test:profiler -- test_execution_trace_with_pt2
Differential Revision: D53299183
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119242
Approved by: https://github.com/aaronenyeshi