pytorch
d9b4788e - cleanup dist autograd context on other nodes when it is released on one node (#27951)

Commit

5 years ago

cleanup dist autograd context on other nodes when it is released on one node (#27951) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27951 we want to clean up the distributed autograd context across the other nodes when a single node is done (here done means exited the context manager `with dist_autograd.context() as context_id: ...`). This PR does a few things to implement the above: 1) Add classes to encapsulate messages for requesting this context release and the response 2) Handling of this request in `request_callback_impl.cpp`. When we receive this request, we get the context from a given context_id and release it. 3) RPC call in `DistAutogradContainer::releaseContext` to send this command. This currently does not wait for an ack or implement any sort of retrying. We send the RPC to all the workerIds we have come into contact with (implemented in https://github.com/pytorch/pytorch/pull/26324) 4) Relevant unit tests In follow up PRs, we will add error checking + retries for this call. ghstack-source-id: 92269279 Test Plan: Added/modified unit tests in `test/dist_autograd_test.py` Differential Revision: D17920137 fbshipit-source-id: 7403512ab5fcbc28d21c548b2e45319dd472e26a

Author

rohan-varma

Committer

facebook-github-bot

Parents

f6c0a89a

pytorch d9b4788e - cleanup dist autograd context on other nodes when it is released on one node (#27951)

pytorch
d9b4788e - cleanup dist autograd context on other nodes when it is released on one node (#27951)