pytorch
daf959c8 - [Profiler] Switch to thread local subqueues to reduce lock contention. (#74151)

Commit

2 years ago

[Profiler] Switch to thread local subqueues to reduce lock contention. (#74151) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74151 The first of several changes to move to an optimized recording data structure to back profiler. This PR keeps the existing monolithic `OpEventData` struct, but splits storage into thread local subqueues so we don't have to lock to insert. Test Plan: Unit tests and benchmarks. The single threaded benchmark is unchanged, and the multithreaded stress test dropped from ~21 us to ~6us. Reviewed By: chaekit Differential Revision: D34720171 fbshipit-source-id: 90b5ebe618b91099e0a19c1f31cfcd8fe1c2ea12 (cherry picked from commit dfed7901ee329224f8fe0b42ef4981e396d918be)

References

#74332 - Merge master into lazy_tensor_staging

Author

Taylor Robie

Committer

pytorchmergebot

Parents

11dc1581

pytorch daf959c8 - [Profiler] Switch to thread local subqueues to reduce lock contention. (#74151)

pytorch
daf959c8 - [Profiler] Switch to thread local subqueues to reduce lock contention. (#74151)