[CUDA12] Autograd engine use current device only (#92354)
This is a device agnostic version #91191.
The reason of existence of this PR is device agnostic policy of autograd engine. Hence, the compile time `USE_CUDA` is not supported, so doing something like:
https://github.com/pytorch/pytorch/blob/fa1ea9f9bcaa77c1370468059be95ad9b421f500/torch/csrc/autograd/engine.cpp#L351-L357
is not effective.
In this PR a check upon CUDA devices in device registry is added such that threads set the same CUDA device.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92354
Approved by: https://github.com/albanD, https://github.com/ngimel