[PyTorch-RPC] In process_group_agent, avoid read-after-free (#35252)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35252
The torch::from_blob() syntax without a deleter syntax is relatively
dangerous and explicitly assumes that the caller will correctly persist
the tensor bits for as long as necessary.
We were at one point correctly persisting the send tensor bits in
process_group_agent, but with the early-return codepaths are not
doing so any longer.
This change switches to a more robust approach where we instead just use
the torch::from_blob-with-deleter syntax, and use std::move to avoid
a copy. There's an extra malloc, but that's effectively free compared with
the rest of the work involved here. And it means we don't have to worry
about the Tensor memory vanishing from underneath the send anymore.
The initial motivation here was dist_autograd_node_failure flakiness.
While the motivating case is handleSend(), we also fix handlePendingMessage().
ghstack-source-id: 100704883
Test Plan:
existing test coverage, e.g.
buck test mode/dev-nosan caffe2/torch/fb/distributed/thriftRpcBackend/test:ProcessGroupAgentTest
Differential Revision: D20607028
fbshipit-source-id: cf9966c5aa9472830cfefaf7fc2f92af9b52630d