[Profiler] Factor common logic into `torch/csrc/profiler/api.h` (#69459)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69459
This change breaks the dependency between the kineto and legacy profiler; instead of `profiler_kineto.h` including `profiler_legacy.h`, they both include `profiler/api.h`. As part of this refactor, I injected some intermediate classes to keep legacy behavior from leaking into the kineto profiler:
1) ProfilerThreadLocalState has become ProfilerThreadLocalStateBase which just handles config and callback handle. Legacy and Kineto profilers inherit this and implement their own very disjoint set of logic.
2) CUDAStubs is a pure virtual class to make the interface more readable, and the "always fail" behavior has been moved to a `DefaultCUDAStubs` class in `api.cpp`.
Test Plan: Ran the overhead ubenchmark.
Reviewed By: aaronenyeshi
Differential Revision: D32678163
fbshipit-source-id: 9b733283e4eae2614db68147de81b72f6094ce6c