[CUDA12] Conditionally set device in autograd engine (#91191)
CUDA 12 introduces behavioral changes in `cudaSetDevice`. In the old version it would just set the device to be used for kernel launches and memory allocations without creating a CUDA context. Now, in CUDA 12, every time `cudaSetDevice` is called for the first time it creates a CUDA context. See issue #91122.
The autograd engine iterates over all devices and sets them:
https://github.com/pytorch/pytorch/blob/f8b348c1fc964ea12b39f37538c4e6fff1d752dc/torch/csrc/autograd/engine.cpp#L1399-L1402
https://github.com/pytorch/pytorch/blob/f8b348c1fc964ea12b39f37538c4e6fff1d752dc/torch/csrc/autograd/engine.cpp#L349
Which causes pollution of CUDA contexts on sibling devices.
This PR introduces a workaround this issue by conditionally setting the device.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91191
Approved by: https://github.com/ngimel