Let JIT unpickler to accept CUDA DataPtr from read_record_ (#46827)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46827
TensorPipe RPC agent uses JIT pickler/unpickler to serialize/deserialize
tensors. Instead of saving tensors to a file, the agent can directly
invoke `cudaMemcpy` to copy tensors from the sender to the receiver
before calling into JIT unpickler. As a result, before unpickling,
the agent might already have allocated tensors and need to pass
them to the JIT unpickler. Currently, this is done by providing a
`read_record` lambda to unpickler for CPU tensors, but this is
no longer sufficient for zero-copy CUDA tensors, as the unpickler
always allocate the tensor on CPU.
To address the above problem, this commit introduces a `use_storage_device`
flag to unpickler ctor. When this flag is set, the unpickler will
use the device from the `DataPtr` returned by the `read_record`
lambda to override the pickled device information and therefore
achieves zero-copy.
Test Plan: Imported from OSS
Reviewed By: wanchaol
Differential Revision: D24533218
Pulled By: mrshenli
fbshipit-source-id: 35acd33fcfb11b1c724f855048cfd7b2991f8903