Cache the DataPtrs in CUDAFuture (#48788)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48788
CUDAFuture needs to inspect the value it contains in order to first determine what devices its tensors reside on (so that it can record events on those devices), and then to record these tensors with the caching allocator when they are used in other streams. Extracting data ptrs can become somewhat expensive (especially if we resort to using the pickler to do that), hence it's probably a good idea to cache the result the first time we compute it.
ghstack-source-id: 118180023
Test Plan: Unit tests
Reviewed By: mrshenli
Differential Revision: D25303486
fbshipit-source-id: 5c541640f6d19249dfb5489ba5e8fad2502836fb