Ensure devices are preserved when forwarding between futures (#57432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57432
In a bunch of places we were creating a future and then "forwarding" the value of another future to it once that other future completed. (This was in order to convert the type of the value, or to "merge" multiple futures into one). However when doing so we often created a child future with an empty set of devices, which meant it didn't support CUDA, and thus would cause a silent synchronization/correctness bug if the parent future did actually contain CUDA tensors.
One way this could have been caught earlier would have been to have Future always extract the DataPtrs, even in CPU-only mode, in order to ensure they always reside on the expected set of devices. Unfortunately this might have some averse perf effects thus should be done carefully.
ghstack-source-id: 128184667
Test Plan: eyes
Reviewed By: mrshenli
Differential Revision: D28143045
fbshipit-source-id: 9af1abf270366dc1df0d4857d6a8cc73668af9d1