Use only generic helpers in CUDAFuture (#57050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57050
Avoid (nearly*) any explicit mention of CUDA in CUDAFuture, and instead use "generic" classes like c10::Event, c10::Stream and most notably c10::impl::DeviceGuardImplInterface which allow us to indirectly manipulate CUDA entities. This is a preparation step to make CUDAFuture device-agnostic and thus become able to merge it into ivalue::Future.
* The one exception is when we construct the c10::impl::DeviceGuardImplInterface, where for now we still hardcode CUDA. This will be fixed in the very next PR
ghstack-source-id: 127713133
(Note: this ignores all push blocking failures!)
Test Plan: CI
Reviewed By: mrshenli
Differential Revision: D28032710
fbshipit-source-id: a240ecc32bda481e8ecf85dab94933e24f832bb0