pytorch
daf959c8 - [Profiler] Switch to thread local subqueues to reduce lock contention. (#74151)

Commit
2 years ago
[Profiler] Switch to thread local subqueues to reduce lock contention. (#74151) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74151 The first of several changes to move to an optimized recording data structure to back profiler. This PR keeps the existing monolithic `OpEventData` struct, but splits storage into thread local subqueues so we don't have to lock to insert. Test Plan: Unit tests and benchmarks. The single threaded benchmark is unchanged, and the multithreaded stress test dropped from ~21 us to ~6us. Reviewed By: chaekit Differential Revision: D34720171 fbshipit-source-id: 90b5ebe618b91099e0a19c1f31cfcd8fe1c2ea12 (cherry picked from commit dfed7901ee329224f8fe0b42ef4981e396d918be)
Author
Taylor Robie
Committer
Parents
Loading