[reland] Create CUDA-aware futures in RequestCallback (#59209)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59209
Reland of https://github.com/pytorch/pytorch/pull/58426
The operations in RequestCallback can return CUDA tensors, thus the futures used to hold them must be CUDA-aware.
ghstack-source-id: 130202844
Test Plan: CI
Reviewed By: mrshenli
Differential Revision: D28623887
fbshipit-source-id: 53561b8ae011458d8f848f0a03830925aff2f0c2