[Profiler] Defer KinetoEvent and GenericTraceActivity creation to post processing. (#71539)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71539
This is the first of the optimizing changes. One of the issues with kineto sometimes being unavailable is we cannot use it as a storage mechanism. KinetoEvent currently fills this role, however KinetoEvent is VERY expensive. A second issue is that because we currently write to two objects, we hold the state lock for the duration of both event creations which is not ideal.
This applies the following optimizations:
1) Intermediate data is stored in a deque in KinetoThreadLocalState, which saves a data->KinetoObserverContext->KinetoEvent double copy. The new KinetoObserverContext just holds a pointer to the element in the deque.
2) OpEventData is much lighter weight (though still far from ideal)
Test Plan:
Script: P470970719
Result: P470970794
For the base case (no special flags), 40% reduction in the `profiler_kineto` portion of the overhead.
Reviewed By: aaronenyeshi
Differential Revision: D32691800
fbshipit-source-id: 3d90d74000105d0ef1a7cb86d01236610e7e3bbd
(cherry picked from commit fbca1b05bac60ed81d6cd3b2cfdb7ffb94ebeb6a)