Use Devices instead of DeviceIndexes in Future (#57353)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57353
Even though we merged CUDAFuture into ivalue::Future, the resulting methods still had basically two distinct codepaths (i.e., an "early exit" if `impl_ == nullptr` for CPU, and then some code for CUDA). This works but it risks creating divergence and inconsistencies when the same class is used in those two modes. Ideally we should have the same codepath, and have the stream operations be no-ops for CPU. Luckily, this is exactly what happens when using a CPU DeviceGuardImplInterface!
Hence here I do that, and for convenience I also use c10::Devices instead of c10::DeviceIndexes (like we did in https://github.com/pytorch/pytorch/pull/57294 for RPC).
ghstack-source-id: 127920097
Test Plan: CI
Reviewed By: mrshenli
Differential Revision: D28100525
fbshipit-source-id: cfac73894220ef5fa8a0389b5533c5d69ba1cf04