Make RRefContext singleton leaky, deal with module destruct order race. (#30172)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30172
RRefContext is a conventional singleton, used by rref.cpp. At module teardown
time, it's not defined whether rref_context.cpp or rref.cpp will be destroyed first.
We were observing a SIGSEGV because RRefContext is destroyed before a dangling
~UserRRef() call is able to execute. Particularly, the underlying
ctx.agent()->getWorkerInfo(ownerId_) call failed.
This change just avoids the SIGSEGV by forcing an intentional leak, though we still
need to deal with why there's a dangling UserRref at module destruction time.
ghstack-source-id: 94287441
Test Plan:
existing test suite
test_elastic_averaging in context of D18511430, where the segfault reproed reliable.
Differential Revision: D18620786
fbshipit-source-id: 17b6ccc0eb1724b579a68615e4afb8e9672b0662