Catch exception in distributed engine callbacks. (#36118)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36118
Callbacks registered with the autograd engine Future in the
distributed engine have a non-trivial amount of business logic. Its entirely
possible that we throw exceptions in these callbacks resulting in those not
being propagated back to the client (since the appropriate future was not
marked as completed).
In this PR, I've added appropriate try-catch blocks to ensure we always mark
the appropriate Future with an error.
ghstack-source-id: 101904294
Test Plan: Tested by simulating an exception.
Differential Revision: D20885521
fbshipit-source-id: b6b6f5994a5fb439e40ec7c585435b6dfe7ddb8e