Use std::shared_ptr for DistAutogradContext. (#29770)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29770
We were passing around const and non-const references for
DistAutogradContext from DistAutogradContainer. This wasn't safe since the
context could be deleted from the container and a thread might still be using
the reference. This usually would happen when a backward pass fails on the node
driving the backward pass (resulting in delete context messages being sent to
all nodes) but other nodes are still executing code related to that autograd
context.
This was also the reason why `test_backward_autograd_engine_error` was flaky.
Using a std::shared_ptr everywhere ensures we're safe and never crash.
Closes #28928
Closes #26922
ghstack-source-id: 94201446
Differential Revision: D18494814
fbshipit-source-id: 0c925fdbd5755f6d876dad56885e2cbaf41fc5f0