pytorch
f8f756ef - TCPStore add watchKey method and new listener thread (#54264)

Commit
3 years ago
TCPStore add watchKey method and new listener thread (#54264) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54264 **Changes** - Creates new listener thread on each client to run the callback - Create new class which listener thread and master thread derive from, this class is used to handle shut down and clean up of the thread in windows and linux - Add watchKey method and update any functions that changes the key value. **Background** This PR adds functionality to TCPStore to allow users to watch a key and execute a callback on key change. It introduces this a new watchKey() API: `TCPStore::watchKey(const std::string& key, std::function<void(std::string, std::string)> callback)` which has parameters `key` and `callback(old_key, new_key)` to run on key change. Since current methods are blocking, for example in`TCPStore::get()` a worker will send a "get key" request to the master -> wait for a response back -> then exit the function and return the value to user, we need a non-blocking, asynchronous way to execute the callback whenever a key changes. This is done by creating a new listener thread on each client which the master can communicate with. Right now, the API is C++ only and only for TCPStore, the internal use case is for elastic RPC. We will have an internal key such as `_NumNodes` and all nodes in the elastic RPC group will watch this key. When a node leaves, this key will be updated and each node will execute a callback to clean up Autograd context and RRef context. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D27709912 Pulled By: H-Huang fbshipit-source-id: 619aa3b2a8eb23f4be5f5736efdcca6c175aadf3
Author
Parents
Loading