Simplify CUDAMultiStreamGuard (#57048)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57048
CUDAMultiStreamGuard had a default constructor and a `original_devices()` method which were only used in a test. I'm removing them here to simplify the API and make it easier to manipulate this class later. One extra benefit is that this class used to get and store the current stream of _all_ devices, whereas now it only does so for the relevant devices.
ghstack-source-id: 127713136
(Note: this ignores all push blocking failures!)
Test Plan: CI
Reviewed By: mrshenli
Differential Revision: D28029160
fbshipit-source-id: 185ef9a7ac909cd0ae6507dad9826fe978e67308