[Profiler] Move python tracing to unified event type (Part 1)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78163
The python function tracer is complicated and separate from the other profile types, so I've chosen to break the change into two diff. The first (this one) reworks the cache structure to make it amenable to integration (as well as some other nice tweaks) and the next one actually moves it over.
The old cache scheme worked very hard to pack all the information about an event into a small struct via bit packing, with a couple secondary caches for things like names. Because of the space constraints on that struct (and the fact that it had to represent all call and return types) there were a lot of subtle invariants swirling around that made it hard to offload anything to a different component. The new cache system is more modular and also, as it turns out, a bit faster. (Benchmarks in part 2)
There is a more detailed description of the cache hierarchy in the PR, but the gist is that I use various specializations to handle the different event types (python call, nn module, c function) and lean on the type system to keep everything safe and organized. (One nice thing about using unique IDs is that they also implicitly encode the event type. They implicitly encode everything!) Given that we are going to want to expand the semantics (e.g. torch ops, DataLoader, etc) this will give a nice way to capture richer semantics without significantly increasing the complexity of the profiler.
Differential Revision: [D36379147](https://our.internmc.facebook.com/intern/diff/D36379147/)
Approved by: https://github.com/aaronenyeshi