pytorch
7d85e770 - Use atomic operations to manipulate current RPC agent (#39663)

Commit
4 years ago
Use atomic operations to manipulate current RPC agent (#39663) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39663 I was investigating a memory corruption issue and thought it may be due to a race condition in (un)setting the current RPC agent. It turns out it wasn't (still investigating...). I had already written this fix, and it is a real fix (there could really be a race condition), so I'm sending it out to see whether there's interest in merging it. I believe its practical usefulness is however very limited, since typically the current RPC agent is only changed twice (at start and at shutdown) and thus there's limited risk for races. As there may be some confusion on atomicity of shared_ptrs, let me clarify a few things from the get go. Operations on the control blocks of shared_ptrs (i.e., increasing and decreasing the refcounts) are atomic, which means that it is safe to manipulate *two different* shared_ptrs that point to the *same* object from *different* threads. However, the shared_ptr object itself is not atomic, which means that it is *not* safe to manipulate the *same* shared_ptr from two *different* threads. For that reason, the STL provides atomic functions explicitly specialized for shared_ptrs: https://en.cppreference.com/w/cpp/memory/shared_ptr/atomic (in C++ 20, they are being replaced by a specialization of std::atomic<std::shared_ptr<T>>). Note that this has been called "the worst question of all of C++" by Louis Brandy at his CppCon talk: https://youtu.be/lkgszkPnV8g?t=1210 ghstack-source-id: 105475005 Test Plan: Unit tests Differential Revision: D21932817 fbshipit-source-id: da33fedd98efb820f284583ce7ff1c1c531dea9c
Author
lw lw
Parents
Loading