[Profiler] Specialized AppendOnlyQueue (#73409)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73409
We can do better than `vector` or `deque`, and it's sufficiently important to the hot path to justify a custom container. (This is part of the larger queue refactor, but this is a standalone drop-in replacement so we don't need to wait.)
Test Plan: It's a pretty simple container type, so I just added a few cpp tests for emplace and read back. I also ran the overhead benchmark (replicates=9) with both `--stressTestKineto` (0.99 -> 0.94 us) and `--stressTestKineto --kinetoProfileMemory` (1.36 -> 1.27 us).
Reviewed By: swolchok
Differential Revision: D34231072
fbshipit-source-id: ed57299729d444d59cf843a0d38a3ee2240eeec1
(cherry picked from commit 43907948f3a8d2137244e7bb59f43999bd660917)