pytorch
1322daa5 - Improve error handling for distributed autograd engine. (#27940)

Commit

5 years ago

Improve error handling for distributed autograd engine. (#27940) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27940 1) If we receive an error for outstanding rpcs, we enqueue an appropriate error on the local autograd engine. 2) Add an `exit_on_error` mode for the local autograd engine, where the computation stops if we see an error. ghstack-source-id: 92603377 Test Plan: Added unit tests to test failures. Differential Revision: D17916844 fbshipit-source-id: 199a7832f1033c36a9bbcc1e80d86576c04965d0

Author

pritamdamania

Committer

facebook-github-bot

Parents

dc17a2ec

pytorch 1322daa5 - Improve error handling for distributed autograd engine. (#27940)

pytorch
1322daa5 - Improve error handling for distributed autograd engine. (#27940)