pytorch
340ccf56 - [PyTorch-RPC] In process_group_agent, avoid read-after-free (#35252)

Commit View On GitHub

Commit

4 years ago

[PyTorch-RPC] In process_group_agent, avoid read-after-free (#35252) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35252 The torch::from_blob() syntax without a deleter syntax is relatively dangerous and explicitly assumes that the caller will correctly persist the tensor bits for as long as necessary. We were at one point correctly persisting the send tensor bits in process_group_agent, but with the early-return codepaths are not doing so any longer. This change switches to a more robust approach where we instead just use the torch::from_blob-with-deleter syntax, and use std::move to avoid a copy. There's an extra malloc, but that's effectively free compared with the rest of the work involved here. And it means we don't have to worry about the Tensor memory vanishing from underneath the send anymore. The initial motivation here was dist_autograd_node_failure flakiness. While the motivating case is handleSend(), we also fix handlePendingMessage(). ghstack-source-id: 100704883 Test Plan: existing test coverage, e.g. buck test mode/dev-nosan caffe2/torch/fb/distributed/thriftRpcBackend/test:ProcessGroupAgentTest Differential Revision: D20607028 fbshipit-source-id: cf9966c5aa9472830cfefaf7fc2f92af9b52630d

Author

jjlilley

Committer

facebook-github-bot

Parents

fddcd72a

pytorch 340ccf56 - [PyTorch-RPC] In process_group_agent, avoid read-after-free (#35252)

Commit

pytorch
340ccf56 - [PyTorch-RPC] In process_group_agent, avoid read-after-free (#35252)